Upload Resource Files
API Endpoint: resource_create_api
There are a variety of ways to upload files to EDX using the API. First, we will cover the initial options, recommendations, and definitions to better understand how to use this specific feature. Further clarification will be made for each bullet point and use case examples will be provided using python 3.8.
Currently Supported Uploading Scenarios
Upload file to EDX Drive within a private workspace (root level)
Upload file to EDX Drive within a private workspace (specified folder location)
Upload file to existing private submission within private workspace
Upload file to existing draft submission
Upload Size Recommendations & Clarifications
It is strongly recommended to use chunking when uploading to EDX via the API. Chunking large files into small parts will allow for percentage tracking progress of your upload. When using the chunk
parameter, you must also use the original_size
parameter to specify the total size of the file in bytes. It is strongly encouraged to programmatically determine the file size in your script rather than manually entering a value.
EDX Account API Key MUST Be Included in Header:
Important
The API Key name component can be added to your header using any of the following ways, “X-CKAN-API-Key”, “EDX-API-Key”, or “Authorization”.
Warning
The user will have to define either the workspace or package ID
If neither workspace or package ID is defined, an ERROR will occur
If both workspace and package ID are defined, an ERROR will occur
To obtain your EDX account’s API key, log into EDX and navigate to user profile page. For a detailed description of how to obtain your EDX API Key, review the Obtaining EDX API Key documentation page.
Your EDX API key is used to check your access permissions as you attempt to upload files to various locations within EDX. You can only upload files to EDX Drives of workspaces that you are an editor or admin of. You can only upload files to submissions that you have editing rights to (private workspace [editor or admin] or specific drafts).
Parameter Definitions
Parameter name |
Description |
Required Field |
---|---|---|
|
The name or ID of workspace that you wish to upload to |
Situational Required |
|
The name or ID of the submission that you wish to upload to |
Situational Required |
|
The name that you want to give to your uploaded resource. Be sure to include file extension. |
Required |
|
The ID of the folder on EDX Drive to place the uploaded resource in |
Optional |
|
Boolean (True, False) to send file to server in chunks |
Required |
|
Total size in bytes of file being transferred via API |
Required If using |
Note
“SITUATIONAL” REQUIRED PARAMETERS ARE ONLY REQUIRED IN SPECIFIC USE CASES.
Situational Use Case Specifics:
workspace_id
is required when you are uploading to an EDX Drive.package_id
is required when you are uploading to a private or draft submission.
Note
You do not have to provide a workspace_id
AND package_id
when adding a resource to a private submission, you only need the package_id but there is no harm in including both (as seen in example #2 below).
Tip
A package’s name is the text that appears in the address bar, it is NOT the plain text title found in the body of the website. This also applies to workspace names.
For Example (submission [aka package] names),
Use
global-oil-gas-features-database
| Do NOT useGlobal Oil & Gas Features Database
To obtain a folder ID, review the beginning of the Obtain Resource Metadata from Folder ID documentation.
Example 1: Chunked File Upload
In this example, we are uploading a large file to an existing workspace’s private submission using python 3.8.
Note
The response object will return a parameter called “key”; this parameter is generated by server logic and must be sent after the first returned request. Server logic will require key data in order to determine which file to append additional file data.
Note
If your chunk size is larger than your file size, then the whole file will be uploaded in one request. It is highly recommended to upload large files in smaller chunks (e.g. 5 GB file - Chunked at chunk_size = 1024 * 1024 * 10 #10 MB chunks, value must be multiple of 1024 ). Returned output will provide the status of uploading file
Attention
Add the
"User-Agent":
parameter within theheaders
of the API requestSet
"User-Agent":
to the value"EDX-USER"
import os
import requests
# Path to your file
path = '/home/user/database-backup.zip'
headers = {
"EDX-API-Key": 'YOUR-API-KEY-HERE',
"User-Agent": "EDX-USER"
}
# Set chunk size. In bytes.
chunk_size = 1024 * 1024 * 10 # 10 MB chunks, value must be multiple of 1024
data = {
# workspace id is used for uploading to edx drive
# as noted in warning admonition above, only define 1 parameter
"workspace_id": 'test-workspace',
# package ID is used for uploading to submission (private or draft)
# as noted in warning admonition above, only define 1 parameter
"package_id": 'test-submission',
"resource_name": 'sql_backup.zip',
"original_size": os.path.getsize(path), # Determine your file size. Server logic uses this to determine when a file is complete.
"chunk_size": chunk_size
}
if os.path.exists(path):
with open(path, "rb") as f:
url = 'https://edx.netl.doe.gov/api/3/action/resource_create_api'
while True:
chunked_stream = f.read(chunk_size)
if not chunked_stream:
break # Break out of loop and stop uploading when no more bytes to stream
r = requests.post(
url, # URL to API endpoint
headers=headers, # Headers dictionary
files={'file': chunked_stream}, # byte stream of chunked data
data=data, # Dictionary of data params
)
r_json = r.json()
# IMPORTANT!!! Add key parameter from response object to request data when chunking.
# Server logic will require key data in order to determine which file to append additional file data.
# The key is used for chunked uploads. This key is needed to resume uploads larger than set chunk size.
data['key'] = r_json['result']['key']
print (r.json())
# An example of return data from the resource_create_api endpoint:
# {
# u'help': u'https://edx.netl.doe.gov/api/3/action/help_show?name=resource_create_api',
# u'success': True,
# u'result': {
# u'resource_name': u'sql_backup.zip',
# u'created': u'2020-10-27T14:19:43.670115',
# u'url': u'https://edx.netl.doe.gov/storage/f/edx/2020/10/2020-10-27T14:18:29.946Z/c9a8f82e-4bf0-4563-b931-d97b27a3f2a1/sql_backup.zip',
# u'last_modified': u'2020-10-27T14:19:43.670131',
# u'key': u'2020-10-27T14:18:29.946Z/c9a8f82e-4bf0-4563-b931-d97b27a3f2a1/sql_backup.zip',
# u'progress': u'100%',
# u'id': u'c9a8f82e-4bf0-4563-b931-d97b27a3f2a1'
# }
# }
Example 2: Uploading Multiple Files From a Local Directory
In this example, we are uploading a multiple files on a computer’s file system to an existing workspace using python 3.8.
Note
The response object will return a parameter called “key”; this parameter is generated by server logic and must be sent after the first returned request. Server logic will require key data in order to determine which file to append additional file data.
Attention
Add the
"User-Agent":
parameter within theheaders
of the API requestSet
"User-Agent":
to the value"EDX-USER"
import os
import requests
# Path to your directory.
path = '/home/user/Documents/project_001'
headers = {
"EDX-API-Key": 'YOUR-API-KEY-HERE',
"User-Agent": "EDX-USER"
}
# Set chunk size. In bytes.
chunk_size = 1024 * 1024 * 10 # 10 MB chunks, value must be multiple of 1024
data = {
# workspace id is used for uploading to edx drive
"workspace_id": 'test-workspace',
"chunk_size": chunk_size
}
# Set up a loop for each file in a given directory.
for filename in os.listdir(path):
filepath = os.path.join(path, filename)
if os.path.isdir(filepath): # Ignore sub directories of given file path, only focuses on files
continue
data['resource_name'] = filename
# WHEN CHUNKING SET THE ORIGINAL SIZE FOR EACH FILE IN A DIRECTORY DURING ITS ITERATION.
data['original_size'] = os.path.getsize(filepath) # Determine your file's size. Server logic uses this to determine when a file is complete.
if os.path.exists(filepath):
with open(filepath, "rb") as f:
url = 'https://edx.netl.doe.gov/api/3/action/resource_create_api'
while True:
chunked_stream = f.read(chunk_size)
if not chunked_stream:
break # Break out of loop and stop uploading when no more bytes to stream
r = requests.post(
url, # URL to API endpoint
headers=headers, # Headers dictionary
files={'file': chunked_stream}, # byte stream of chunked data
data=data, # Dictionary of data params
)
r_json = r.json()
# IMPORTANT!!! Add key parameter from response object to request data when chunking.
# Server logic will require key data in order to determine which file to append additional file data.
if(r_json['success']):
# The key is used for chunked uploads. This key is needed to resume uploads larger than set chunk size.
data['key'] = r_json['result']['key']
print (r_json['result']['progress']) # Print out current progress of upload.
# Once the entire file is completed, the key needs to be reset for the next upload
if(r_json['result']['progress'] == "100%"):
data.pop('key', None) #remove previous resource's upload key
print (r.json())
# An example of return data from the resource_create_api endpoint:
# {
# u'help': u'https://edx.netl.doe.gov/api/3/action/help_show?name=resource_create_api',
# u'success': True,
# u'result': {
# u'resource_name': u'sql_backup.zip',
# u'created': u'2020-10-27T14:19:43.670115',
# u'url': u'https://edx.netl.doe.gov/storage/f/edx/2020/10/2020-10-27T14:18:29.946Z/c9a8f82e-4bf0-4563-b931-d97b27a3f2a1/sql_backup.zip',
# u'last_modified': u'2020-10-27T14:19:43.670131',
# u'key': u'2020-10-27T14:18:29.946Z/c9a8f82e-4bf0-4563-b931-d97b27a3f2a1/sql_backup.zip',
# u'progress': u'100%',
# u'id': u'c9a8f82e-4bf0-4563-b931-d97b27a3f2a1'
# }
# }
Example 3: Uploading Multiple Files From a Local Directory for Draft Submissions
Note
The response object will return a parameter called “key”; this parameter is generated by server logic and must be sent after the first returned request. Server logic will require key data in order to determine which file to append additional file data.
Attention
Add the
"User-Agent":
parameter within theheaders
of the API requestSet
"User-Agent":
to the value"EDX-USER"
import os
import requests
# Path to your directory.
path = '/home/user/Documents/project_001'
headers = {
"EDX-API-Key": 'YOUR-API-KEY-HERE',
"User-Agent": "EDX-USER"
}
# Set chunk size. In bytes.
chunk_size = 1024 * 1024 * 10 # 10 MB chunks, value must be multiple of 1024
data = {
# package ID is used for uploading to submission (private or draft [your account must own the draft or is a public editor of it])
"package_id": 'test-submission',
"chunk_size": chunk_size
}
# Set up a loop for each file in a given directory.
for filename in os.listdir(path):
filepath = os.path.join(path, filename)
if os.path.isdir(filepath): # Ignore sub directories of given file path, only focuses on files
continue
data['resource_name'] = filename
# WHEN CHUNKING SET THE ORIGINAL SIZE FOR EACH FILE IN A DIRECTORY DURING ITS ITERATION.
data['original_size'] = os.path.getsize(filepath) # Determine your file's size. Server logic uses this to determine when a file is complete.
if os.path.exists(filepath):
with open(filepath, "rb") as f:
url = 'https://edx.netl.doe.gov/api/3/action/resource_create_api'
while True:
chunked_stream = f.read(chunk_size)
if not chunked_stream:
break # Break out of loop and stop uploading when no more bytes to stream
r = requests.post(
url, # URL to API endpoint
headers=headers, # Headers dictionary
files={'file': chunked_stream}, # byte stream of chunked data
data=data, # Dictionary of data params
)
r_json = r.json()
# IMPORTANT!!! Add key parameter from response object to request data when chunking.
# Server logic will require key data in order to determine which file to append additional file data.
if(r_json['success']):
# The key is used for chunked uploads. This key is needed to resume uploads larger than set chunk size.
data['key'] = r_json['result']['key']
print (r_json['result']['progress']) # Print out current progress of upload.
# Once the entire file is completed, the key needs to be reset for the next upload
if(r_json['result']['progress'] == "100%"):
data.pop('key', None) #remove previous resource's upload key
print (r.json())
# An example of return data from the resource_create_api endpoint:
# {
# u'help': u'https://edx.netl.doe.gov/api/3/action/help_show?name=resource_create_api',
# u'success': True,
# u'result': {
# u'resource_name': u'sql_backup.zip',
# u'created': u'2020-10-27T14:19:43.670115',
# u'url': u'https://edx.netl.doe.gov/storage/f/edx/2020/10/2020-10-27T14:18:29.946Z/c9a8f82e-4bf0-4563-b931-d97b27a3f2a1/sql_backup.zip',
# u'last_modified': u'2020-10-27T14:19:43.670131',
# u'key': u'2020-10-27T14:18:29.946Z/c9a8f82e-4bf0-4563-b931-d97b27a3f2a1/sql_backup.zip',
# u'progress': u'100%',
# u'id': u'c9a8f82e-4bf0-4563-b931-d97b27a3f2a1'
# }
# }