Upload Resource Files

API Endpoint: resource_create_api

There are a variety of ways to upload files to EDX using the API. First, we will cover the initial options, recommendations, and definitions to better understand how to use this specific feature. Further clarification will be made for each bullet point and use case examples will be provided using python 3.8.

Currently Supported Uploading Scenarios

  1. Upload file to EDX Drive within a private workspace (root level)

  2. Upload file to EDX Drive within a private workspace (specified folder location)

  3. Upload file to existing private submission within private workspace

  4. Upload file to existing draft submission

Upload Size Recommendations & Clarifications

It is strongly recommended to use chunking when uploading to EDX via the API. Chunking large files into small parts will allow for percentage tracking progress of your upload. When using the chunk parameter, you must also use the original_size parameter to specify the total size of the file in bytes. It is strongly encouraged to programmatically determine the file size in your script rather than manually entering a value.

EDX Account API Key MUST Be Included in Header:

Important

The API Key name component can be added to your header using any of the following ways, “X-CKAN-API-Key”, “EDX-API-Key”, or “Authorization”.

Warning

  • The user will have to define either the workspace or package ID

    • If neither workspace or package ID is defined, an ERROR will occur

    • If both workspace and package ID are defined, an ERROR will occur

To obtain your EDX account’s API key, log into EDX and navigate to user profile page. For a detailed description of how to obtain your EDX API Key, review the Obtaining EDX API Key documentation page.

Your EDX API key is used to check your access permissions as you attempt to upload files to various locations within EDX. You can only upload files to EDX Drives of workspaces that you are an editor or admin of. You can only upload files to submissions that you have editing rights to (private workspace [editor or admin] or specific drafts).

Parameter Definitions

Parameter name

Description

Required Field

workspace_id

The name or ID of workspace that you wish to upload to

Situational Required

package_id

The name or ID of the submission that you wish to upload to

Situational Required

resource_name

The name that you want to give to your uploaded resource. Be sure to include file extension.

Required

folder_id

The ID of the folder on EDX Drive to place the uploaded resource in

Optional

chunk

Boolean (True, False) to send file to server in chunks

Required

original_size

Total size in bytes of file being transferred via API

Required If using chunk

Note

“SITUATIONAL” REQUIRED PARAMETERS ARE ONLY REQUIRED IN SPECIFIC USE CASES.

Important

When defining workspace ID, ONLY select the end path of the URL

url path image

Situational Use Case Specifics:

  • workspace_id is required when you are uploading to an EDX Drive.

  • package_id is required when you are uploading to a private or draft submission.

Note

You do not have to provide a workspace_id AND package_id when adding a resource to a private submission, you only need the package_id but there is no harm in including both (as seen in example #2 below).

Tip

A package’s name is the text that appears in the address bar, it is NOT the plain text title found in the body of the website. This also applies to workspace names.

For Example (submission [aka package] names),

Use global-oil-gas-features-database | Do NOT use Global Oil & Gas Features Database

To obtain a folder ID, review the beginning of the Obtain Resource Metadata from Folder ID documentation.

Example 1: Chunked File Upload

In this example, we are uploading a large file to an existing workspace’s private submission using python 3.8.

Note

The response object will return a parameter called “key”; this parameter is generated by server logic and must be sent after the first returned request. Server logic will require key data in order to determine which file to append additional file data.

Note

If your chunk size is larger than your file size, then the whole file will be uploaded in one request. It is highly recommended to upload large files in smaller chunks (e.g. 5 GB file - Chunked at chunk_size = 1024 * 1024 * 10 #10 MB chunks, value must be multiple of 1024 ). Returned output will provide the status of uploading file

Important

When defining workspace ID, ONLY select the end path of the URL

url path image

Attention

  • Add the "User-Agent": parameter within the headers of the API request

  • Set "User-Agent": to the value "EDX-USER"

  • python
  • output
import os
import requests

# Path to your file
path = '/home/user/database-backup.zip'

headers = {
   "EDX-API-Key": 'YOUR-API-KEY-HERE',
   "User-Agent": "EDX-USER"
}

# Set chunk size. In bytes.
chunk_size = 1024 * 1024 * 10 # 10 MB chunks, value must be multiple of 1024

data = {
   # workspace id is used for uploading to edx drive
   # as noted in warning admonition above, only define 1 parameter
   "workspace_id": 'test-workspace',
   # package ID is used for uploading to submission (private or draft)
   # as noted in warning admonition above, only define 1 parameter
   "package_id": 'test-submission',
   "resource_name": 'sql_backup.zip',
   "original_size": os.path.getsize(path), # Determine your file size. Server logic uses this to determine when a file is complete.
   "chunk_size": chunk_size
}

if os.path.exists(path):
   with open(path, "rb") as f:
      url = 'https://edx.netl.doe.gov/api/3/action/resource_create_api'
      while True:
            chunked_stream = f.read(chunk_size)
            if not chunked_stream:
               break # Break out of loop and stop uploading when no more bytes to stream
            r = requests.post(
               url, # URL to API endpoint
               headers=headers, # Headers dictionary
               files={'file': chunked_stream}, # byte stream of chunked data
               data=data, # Dictionary of data params
            )
            r_json = r.json()
            # IMPORTANT!!! Add key parameter from response object to request data when chunking.
            # Server logic will require key data in order to determine which file to append additional file data.
            # The key is used for chunked uploads. This key is needed to resume uploads larger than set chunk size.
            data['key'] = r_json['result']['key']
            print (r.json())
# An example of return data from the resource_create_api endpoint:
# {
#     u'help': u'https://edx.netl.doe.gov/api/3/action/help_show?name=resource_create_api',
#     u'success': True,
#     u'result': {
#         u'resource_name': u'sql_backup.zip',
#         u'created': u'2020-10-27T14:19:43.670115',
#         u'url': u'https://edx.netl.doe.gov/storage/f/edx/2020/10/2020-10-27T14:18:29.946Z/c9a8f82e-4bf0-4563-b931-d97b27a3f2a1/sql_backup.zip',
#         u'last_modified': u'2020-10-27T14:19:43.670131',
#         u'key': u'2020-10-27T14:18:29.946Z/c9a8f82e-4bf0-4563-b931-d97b27a3f2a1/sql_backup.zip',
#         u'progress': u'100%',
#         u'id': u'c9a8f82e-4bf0-4563-b931-d97b27a3f2a1'
#     }
# }

Example 2: Uploading Multiple Files From a Local Directory

In this example, we are uploading a multiple files on a computer’s file system to an existing workspace using python 3.8.

Note

The response object will return a parameter called “key”; this parameter is generated by server logic and must be sent after the first returned request. Server logic will require key data in order to determine which file to append additional file data.

Important

When defining workspace ID, ONLY select the end path of the URL

url path image

Attention

  • Add the "User-Agent": parameter within the headers of the API request

  • Set "User-Agent": to the value "EDX-USER"

  • python
  • output
import os
import requests

# Path to your directory.
path = '/home/user/Documents/project_001'

headers = {
   "EDX-API-Key": 'YOUR-API-KEY-HERE',
   "User-Agent": "EDX-USER"
}

# Set chunk size. In bytes.
chunk_size = 1024 * 1024 * 10 # 10 MB chunks, value must be multiple of 1024

data = {
   # workspace id is used for uploading to edx drive
   "workspace_id": 'test-workspace',
   "chunk_size": chunk_size
}

# Set up a loop for each file in a given directory.
for filename in os.listdir(path):
   filepath = os.path.join(path, filename)
   if os.path.isdir(filepath): # Ignore sub directories of given file path, only focuses on files
      continue

   data['resource_name'] = filename

   # WHEN CHUNKING SET THE ORIGINAL SIZE FOR EACH FILE IN A DIRECTORY DURING ITS ITERATION.
   data['original_size'] = os.path.getsize(filepath) # Determine your file's size. Server logic uses this to determine when a file is complete.

   if os.path.exists(filepath):
      with open(filepath, "rb") as f:
            url = 'https://edx.netl.doe.gov/api/3/action/resource_create_api'
            while True:
               chunked_stream = f.read(chunk_size)
               if not chunked_stream:
                  break # Break out of loop and stop uploading when no more bytes to stream
               r = requests.post(
                  url, # URL to API endpoint
                  headers=headers, # Headers dictionary
                  files={'file': chunked_stream}, # byte stream of chunked data
                  data=data, # Dictionary of data params
               )
               r_json = r.json()

               # IMPORTANT!!! Add key parameter from response object to request data when chunking.
               # Server logic will require key data in order to determine which file to append additional file data.
               if(r_json['success']):
                  # The key is used for chunked uploads. This key is needed to resume uploads larger than set chunk size.
                  data['key'] = r_json['result']['key']
                  print (r_json['result']['progress']) # Print out current progress of upload.

                  # Once the entire file is completed, the key needs to be reset for the next upload
                  if(r_json['result']['progress'] == "100%"):
                        data.pop('key', None) #remove previous resource's upload key
            print (r.json())
# An example of return data from the resource_create_api endpoint:
# {
#     u'help': u'https://edx.netl.doe.gov/api/3/action/help_show?name=resource_create_api',
#     u'success': True,
#     u'result': {
#         u'resource_name': u'sql_backup.zip',
#         u'created': u'2020-10-27T14:19:43.670115',
#         u'url': u'https://edx.netl.doe.gov/storage/f/edx/2020/10/2020-10-27T14:18:29.946Z/c9a8f82e-4bf0-4563-b931-d97b27a3f2a1/sql_backup.zip',
#         u'last_modified': u'2020-10-27T14:19:43.670131',
#         u'key': u'2020-10-27T14:18:29.946Z/c9a8f82e-4bf0-4563-b931-d97b27a3f2a1/sql_backup.zip',
#         u'progress': u'100%',
#         u'id': u'c9a8f82e-4bf0-4563-b931-d97b27a3f2a1'
#     }
# }

Example 3: Uploading Multiple Files From a Local Directory for Draft Submissions

Note

The response object will return a parameter called “key”; this parameter is generated by server logic and must be sent after the first returned request. Server logic will require key data in order to determine which file to append additional file data.

Important

When defining workspace ID, ONLY select the end path of the URL

url path image

Attention

  • Add the "User-Agent": parameter within the headers of the API request

  • Set "User-Agent": to the value "EDX-USER"

  • python
  • output
import os
import requests

# Path to your directory.
path = '/home/user/Documents/project_001'

headers = {
   "EDX-API-Key": 'YOUR-API-KEY-HERE',
   "User-Agent": "EDX-USER"
}

# Set chunk size. In bytes.
chunk_size = 1024 * 1024 * 10 # 10 MB chunks, value must be multiple of 1024

data = {
   # package ID is used for uploading to submission (private or draft [your account must own the draft or is a public editor of it])
   "package_id": 'test-submission',
   "chunk_size": chunk_size
}

# Set up a loop for each file in a given directory.
for filename in os.listdir(path):
   filepath = os.path.join(path, filename)
   if os.path.isdir(filepath): # Ignore sub directories of given file path, only focuses on files
      continue

   data['resource_name'] = filename

   # WHEN CHUNKING SET THE ORIGINAL SIZE FOR EACH FILE IN A DIRECTORY DURING ITS ITERATION.
   data['original_size'] = os.path.getsize(filepath) # Determine your file's size. Server logic uses this to determine when a file is complete.

   if os.path.exists(filepath):
      with open(filepath, "rb") as f:
            url = 'https://edx.netl.doe.gov/api/3/action/resource_create_api'
            while True:
               chunked_stream = f.read(chunk_size)
               if not chunked_stream:
                  break # Break out of loop and stop uploading when no more bytes to stream
               r = requests.post(
                  url, # URL to API endpoint
                  headers=headers, # Headers dictionary
                  files={'file': chunked_stream}, # byte stream of chunked data
                  data=data, # Dictionary of data params
               )
               r_json = r.json()

               # IMPORTANT!!! Add key parameter from response object to request data when chunking.
               # Server logic will require key data in order to determine which file to append additional file data.
               if(r_json['success']):
                  # The key is used for chunked uploads. This key is needed to resume uploads larger than set chunk size.
                  data['key'] = r_json['result']['key']
                  print (r_json['result']['progress']) # Print out current progress of upload.

                  # Once the entire file is completed, the key needs to be reset for the next upload
                  if(r_json['result']['progress'] == "100%"):
                        data.pop('key', None) #remove previous resource's upload key
            print (r.json())
# An example of return data from the resource_create_api endpoint:
# {
#     u'help': u'https://edx.netl.doe.gov/api/3/action/help_show?name=resource_create_api',
#     u'success': True,
#     u'result': {
#         u'resource_name': u'sql_backup.zip',
#         u'created': u'2020-10-27T14:19:43.670115',
#         u'url': u'https://edx.netl.doe.gov/storage/f/edx/2020/10/2020-10-27T14:18:29.946Z/c9a8f82e-4bf0-4563-b931-d97b27a3f2a1/sql_backup.zip',
#         u'last_modified': u'2020-10-27T14:19:43.670131',
#         u'key': u'2020-10-27T14:18:29.946Z/c9a8f82e-4bf0-4563-b931-d97b27a3f2a1/sql_backup.zip',
#         u'progress': u'100%',
#         u'id': u'c9a8f82e-4bf0-4563-b931-d97b27a3f2a1'
#     }
# }