How to migrate external API data to the Amorphic dataset?
info
- Follow the steps mentioned below.
- Total time taken for this task: 20 Minutes.
- Pre-requisites: User registration is completed, logged in to Amorphic and role switched
Tidbits
- External API connections are used to migrate data from an API endpoint to Amorphic's dataset.
- Usually, these API endpoints are created by AWS API Gateway.
- Only Basicauthentication is supported.
- For this workshop, Let's ingest from a publicly available country-wise COVID-19data usingcovid-api.mmediagroup.fr/v1. This is the code running in AWS Lambda. More details at https://github.com/M-Media-Group/Covid-19-API.
- This will fetch data in the JSON format.
Create a source connection
- Click on 'Connections' widget on the home screen or click on INGESTION-->Connectionson the left side navigation-bar or you may also click onNavigatoron top right corner and search forConnections.
- Click on a ➕ icon at the top right corner.
- Enter the following details and click on Create Connection.
{
  "Connection Name": "remote-api-2-amorphic-<your-userid>"
  "Connection Type":  "S3"
  "Description": "Ingest from a publicly available country-wise `COVID-19` data using `covid-api.mmediagroup.fr/v1` to Amorphic. "
  "Authorized Users": "Select your user name and any other user names you want to grant permission"
  "Keywords": "Add relevant keywords like 'ext-api'. This will be useful for search"
  "Version": "1.0"
  "API Endpoint": "https://covid-api.mmediagroup.fr/v1/cases"
  "API Authentication": "BASIC"
  "Method": "GET"
}
Create a target dataset
- Click on 'DATASETS' --> 'Datasets' from left navigation-bar.
- Click on ➕ icon at the top right corner.
- Enter the following information and click on 'Register'.
{
  "Dataset Name": "extapi_2_amd_ds_<your_userid>"
  "Description": "This dataset is a destination for external API connection remote-api-2-amorphic-<your-userid>"
  "Domain": "workshop(workshop)"
  "Data Classifications":
  "Keywords": "ext-api, covid-19"
  "Connection Type": "External API"
  "File Type": "Others"
  "Target Location": "S3"
  "Update Method": "Append"
  "Connection": "remote-api-2-amorphic-<your-userid>"
  "Enable Malware Detection": "No"
  "Enable AI Services": "No"
  "Enable Data Cleanup": "No"
}
Setup a schedule
- Click on 'SCHEDULES' from left navigation-bar.
- Click on ➕ icon at the top right corner.
- Enter the following information and click on 'Create'.
{
  "Schedule Name": "extapi_2_ds_sched_<your_userid>"
  "Description": "This schedule runs every 5 minutes to pull data from an external API to the Amorphic dataset."
  "Type Of Job": "Data Ingestion"
  "Select Dataset": "extapi_2_amd_ds_<your_userid> | ext-api"        <-- Click ↩️ icon to refesh the list
  "Keywords": "your_userid, ext-api"
  "Allocated Capacity":
  "Schedule Type": "Time Based"
  "Schedule Expression": "rate(5 minutes)"
}
Check data transfer
- Execution Statustab of the schedule shows the status of executions as shown below.
- Hover on the message icon ✅ to check the job status.
- For more details, click on 'three dots' and check output logs.
- Check the filestab of dataset. The latest COVID-19 data in the JSON format is migrated here.
Disable schedule
- You don't want to keep running the schedule forever. This will reduce the load on the API.
- Click on the Disable Scheduleicon of the schedule page.
- Click Yes.
You can do more...
- Analyze JSON data in an ETL job to get insights from covid-19 data.