Upload Resource Files ===================== .. admonition:: API Endpoint: resource_create_api There are a variety of ways to upload files to EDX using the API. First, we will cover the initial options, recommendations, and definitions to better understand how to use this specific feature. Further clarification will be made for each bullet point and use case examples will be provided using *python 3.8.* Currently Supported Uploading Scenarios ---------------------------------------- 1. Upload file to EDX Drive within a private workspace (root level) 2. Upload file to EDX Drive within a private workspace (specified folder location) 3. Upload file to existing private submission within private workspace 4. Upload file to existing draft submission Upload Size Recommendations & Clarifications --------------------------------------------- It is strongly recommended to use chunking when uploading to EDX via the API. Chunking large files into small parts will allow for percentage tracking progress of your upload. When using the ``chunk`` parameter, you must also use the ``original_size`` parameter to specify the total size of the file in bytes. *It is strongly encouraged to programmatically determine the file size in your script rather than manually entering a value.* **EDX Account API Key MUST Be Included in Header:** .. important:: *The API Key name component can be added to your header using any of the following ways, "X-CKAN-API-Key", "EDX-API-Key", or "Authorization".* .. warning:: * The user will have to define either the workspace or package ID * If neither workspace or package ID is defined, an **ERROR** will occur * If both workspace and package ID are defined, an **ERROR** will occur To obtain your EDX account's API key, log into EDX and navigate to user profile page. For a detailed description of how to obtain your EDX API Key, review the :ref:`Obtaining EDX API Key ` documentation page. Your EDX API key is used to check your access permissions as you attempt to upload files to various locations within EDX. You can only upload files to EDX Drives of workspaces that you are an editor or admin of. You can only upload files to submissions that you have editing rights to (private workspace [editor or admin] or specific drafts). Parameter Definitions --------------------- .. list-table:: :header-rows: 1 * - Parameter name - Description - Required Field * - ``workspace_id`` - The name or ID of workspace that you wish to upload to - *Situational* **Required** * - ``package_id`` - The name or ID of the submission that you wish to upload to - *Situational* **Required** * - ``resource_name`` - The name that you want to give to your uploaded resource. *Be sure to include file extension.* - **Required** * - ``folder_id`` - The ID of the folder on EDX Drive to place the uploaded resource in - Optional * - ``chunk`` - Boolean (True, False) to send file to server in chunks - **Required** * - ``original_size`` - Total size in bytes of file being transferred via API - **Required** If using ``chunk`` .. note:: "SITUATIONAL" REQUIRED PARAMETERS ARE ONLY REQUIRED IN SPECIFIC USE CASES. .. important:: When defining workspace ID, **ONLY** select the end path of the URL .. figure:: /images/url-path.png :align: center :alt: url path image :scale: 100 % Situational Use Case Specifics: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ * ``workspace_id`` is required when you are uploading to an EDX Drive. * ``package_id`` is required when you are uploading to a private or draft submission. .. note:: You do not have to provide a ``workspace_id`` AND ``package_id`` when adding a resource to a private submission, you only need the package_id but there is no harm in including both (as seen in example #2 below). .. tip:: A package's name is the text that appears in the address bar, it is NOT the plain text title found in the body of the website. This also applies to workspace names. For Example (submission [aka package] names), **Use** ``global-oil-gas-features-database`` | Do **NOT** use ``Global Oil & Gas Features Database`` To obtain a folder ID, review the beginning of the :ref:`Obtain Resource Metadata from Folder ID ` documentation. Example 1: Chunked File Upload ------------------------------- In this example, we are uploading a large file to an existing workspace's private submission using python 3.8. .. note:: The response object will return a parameter called "key"; this parameter is generated by server logic and must be sent after the first returned request. Server logic will require key data in order to determine which file to append additional file data. .. note:: If your chunk size is larger than your file size, then the whole file will be uploaded in one request. It is highly recommended to upload large files in smaller chunks (e.g. 5 GB file - Chunked at chunk_size = 1024 * 1024 * 10 **#10 MB chunks, value must be multiple of 1024** ). Returned output will provide the status of uploading file .. important:: When defining workspace ID, **ONLY** select the end path of the URL .. figure:: /images/url-path.png :align: center :alt: url path image :scale: 100 % .. attention:: * Add the ``"User-Agent":`` parameter within the ``headers`` of the API request * Set ``"User-Agent":`` to the value ``"EDX-USER"`` .. tabs:: lang1 .. code-tab:: python import os import requests # Path to your file path = '/home/user/database-backup.zip' headers = { "EDX-API-Key": 'YOUR-API-KEY-HERE', "User-Agent": "EDX-USER" } # Set chunk size. In bytes. chunk_size = 1024 * 1024 * 10 # 10 MB chunks, value must be multiple of 1024 data = { # workspace id is used for uploading to edx drive # as noted in warning admonition above, only define 1 parameter "workspace_id": 'test-workspace', # package ID is used for uploading to submission (private or draft) # as noted in warning admonition above, only define 1 parameter "package_id": 'test-submission', "resource_name": 'sql_backup.zip', "original_size": os.path.getsize(path), # Determine your file size. Server logic uses this to determine when a file is complete. "chunk_size": chunk_size } if os.path.exists(path): with open(path, "rb") as f: url = 'https://edx.netl.doe.gov/api/3/action/resource_create_api' while True: chunked_stream = f.read(chunk_size) if not chunked_stream: break # Break out of loop and stop uploading when no more bytes to stream r = requests.post( url, # URL to API endpoint headers=headers, # Headers dictionary files={'file': chunked_stream}, # byte stream of chunked data data=data, # Dictionary of data params ) r_json = r.json() # IMPORTANT!!! Add key parameter from response object to request data when chunking. # Server logic will require key data in order to determine which file to append additional file data. # The key is used for chunked uploads. This key is needed to resume uploads larger than set chunk size. data['key'] = r_json['result']['key'] print (r.json()) .. code-tab:: output # An example of return data from the resource_create_api endpoint: # { # u'help': u'https://edx.netl.doe.gov/api/3/action/help_show?name=resource_create_api', # u'success': True, # u'result': { # u'resource_name': u'sql_backup.zip', # u'created': u'2020-10-27T14:19:43.670115', # u'url': u'https://edx.netl.doe.gov/storage/f/edx/2020/10/2020-10-27T14:18:29.946Z/c9a8f82e-4bf0-4563-b931-d97b27a3f2a1/sql_backup.zip', # u'last_modified': u'2020-10-27T14:19:43.670131', # u'key': u'2020-10-27T14:18:29.946Z/c9a8f82e-4bf0-4563-b931-d97b27a3f2a1/sql_backup.zip', # u'progress': u'100%', # u'id': u'c9a8f82e-4bf0-4563-b931-d97b27a3f2a1' # } # } Example 2: Uploading Multiple Files From a Local Directory ----------------------------------------------------------- In this example, we are uploading a multiple files on a computer's file system to an existing workspace using python 3.8. .. note:: The response object will return a parameter called "key"; this parameter is generated by server logic and must be sent after the first returned request. Server logic will require key data in order to determine which file to append additional file data. .. important:: When defining workspace ID, **ONLY** select the end path of the URL .. figure:: /images/url-path.png :align: center :alt: url path image :scale: 100 % .. attention:: * Add the ``"User-Agent":`` parameter within the ``headers`` of the API request * Set ``"User-Agent":`` to the value ``"EDX-USER"`` .. tabs:: lang1 .. code-tab:: python import os import requests # Path to your directory. path = '/home/user/Documents/project_001' headers = { "EDX-API-Key": 'YOUR-API-KEY-HERE', "User-Agent": "EDX-USER" } # Set chunk size. In bytes. chunk_size = 1024 * 1024 * 10 # 10 MB chunks, value must be multiple of 1024 data = { # workspace id is used for uploading to edx drive "workspace_id": 'test-workspace', "chunk_size": chunk_size } # Set up a loop for each file in a given directory. for filename in os.listdir(path): filepath = os.path.join(path, filename) if os.path.isdir(filepath): # Ignore sub directories of given file path, only focuses on files continue data['resource_name'] = filename # WHEN CHUNKING SET THE ORIGINAL SIZE FOR EACH FILE IN A DIRECTORY DURING ITS ITERATION. data['original_size'] = os.path.getsize(filepath) # Determine your file's size. Server logic uses this to determine when a file is complete. if os.path.exists(filepath): with open(filepath, "rb") as f: url = 'https://edx.netl.doe.gov/api/3/action/resource_create_api' while True: chunked_stream = f.read(chunk_size) if not chunked_stream: break # Break out of loop and stop uploading when no more bytes to stream r = requests.post( url, # URL to API endpoint headers=headers, # Headers dictionary files={'file': chunked_stream}, # byte stream of chunked data data=data, # Dictionary of data params ) r_json = r.json() # IMPORTANT!!! Add key parameter from response object to request data when chunking. # Server logic will require key data in order to determine which file to append additional file data. if(r_json['success']): # The key is used for chunked uploads. This key is needed to resume uploads larger than set chunk size. data['key'] = r_json['result']['key'] print (r_json['result']['progress']) # Print out current progress of upload. # Once the entire file is completed, the key needs to be reset for the next upload if(r_json['result']['progress'] == "100%"): data.pop('key', None) #remove previous resource's upload key print (r.json()) .. code-tab:: output # An example of return data from the resource_create_api endpoint: # { # u'help': u'https://edx.netl.doe.gov/api/3/action/help_show?name=resource_create_api', # u'success': True, # u'result': { # u'resource_name': u'sql_backup.zip', # u'created': u'2020-10-27T14:19:43.670115', # u'url': u'https://edx.netl.doe.gov/storage/f/edx/2020/10/2020-10-27T14:18:29.946Z/c9a8f82e-4bf0-4563-b931-d97b27a3f2a1/sql_backup.zip', # u'last_modified': u'2020-10-27T14:19:43.670131', # u'key': u'2020-10-27T14:18:29.946Z/c9a8f82e-4bf0-4563-b931-d97b27a3f2a1/sql_backup.zip', # u'progress': u'100%', # u'id': u'c9a8f82e-4bf0-4563-b931-d97b27a3f2a1' # } # } Example 3: Uploading Multiple Files From a Local Directory for Draft Submissions ------------------------------------------------------------------------------------ .. note:: The response object will return a parameter called "key"; this parameter is generated by server logic and must be sent after the first returned request. Server logic will require key data in order to determine which file to append additional file data. .. important:: When defining workspace ID, **ONLY** select the end path of the URL .. figure:: /images/url-path.png :align: center :alt: url path image :scale: 100 % .. attention:: * Add the ``"User-Agent":`` parameter within the ``headers`` of the API request * Set ``"User-Agent":`` to the value ``"EDX-USER"`` .. tabs:: lang1 .. code-tab:: python import os import requests # Path to your directory. path = '/home/user/Documents/project_001' headers = { "EDX-API-Key": 'YOUR-API-KEY-HERE', "User-Agent": "EDX-USER" } # Set chunk size. In bytes. chunk_size = 1024 * 1024 * 10 # 10 MB chunks, value must be multiple of 1024 data = { # package ID is used for uploading to submission (private or draft [your account must own the draft or is a public editor of it]) "package_id": 'test-submission', "chunk_size": chunk_size } # Set up a loop for each file in a given directory. for filename in os.listdir(path): filepath = os.path.join(path, filename) if os.path.isdir(filepath): # Ignore sub directories of given file path, only focuses on files continue data['resource_name'] = filename # WHEN CHUNKING SET THE ORIGINAL SIZE FOR EACH FILE IN A DIRECTORY DURING ITS ITERATION. data['original_size'] = os.path.getsize(filepath) # Determine your file's size. Server logic uses this to determine when a file is complete. if os.path.exists(filepath): with open(filepath, "rb") as f: url = 'https://edx.netl.doe.gov/api/3/action/resource_create_api' while True: chunked_stream = f.read(chunk_size) if not chunked_stream: break # Break out of loop and stop uploading when no more bytes to stream r = requests.post( url, # URL to API endpoint headers=headers, # Headers dictionary files={'file': chunked_stream}, # byte stream of chunked data data=data, # Dictionary of data params ) r_json = r.json() # IMPORTANT!!! Add key parameter from response object to request data when chunking. # Server logic will require key data in order to determine which file to append additional file data. if(r_json['success']): # The key is used for chunked uploads. This key is needed to resume uploads larger than set chunk size. data['key'] = r_json['result']['key'] print (r_json['result']['progress']) # Print out current progress of upload. # Once the entire file is completed, the key needs to be reset for the next upload if(r_json['result']['progress'] == "100%"): data.pop('key', None) #remove previous resource's upload key print (r.json()) .. code-tab:: output # An example of return data from the resource_create_api endpoint: # { # u'help': u'https://edx.netl.doe.gov/api/3/action/help_show?name=resource_create_api', # u'success': True, # u'result': { # u'resource_name': u'sql_backup.zip', # u'created': u'2020-10-27T14:19:43.670115', # u'url': u'https://edx.netl.doe.gov/storage/f/edx/2020/10/2020-10-27T14:18:29.946Z/c9a8f82e-4bf0-4563-b931-d97b27a3f2a1/sql_backup.zip', # u'last_modified': u'2020-10-27T14:19:43.670131', # u'key': u'2020-10-27T14:18:29.946Z/c9a8f82e-4bf0-4563-b931-d97b27a3f2a1/sql_backup.zip', # u'progress': u'100%', # u'id': u'c9a8f82e-4bf0-4563-b931-d97b27a3f2a1' # } # }