Download Resource File(s)
=========================

.. important:: 
    An API Key is required for ALL downloads as of EDX 3.07 (09/03/2020)
    
    The API Key name component can be added to your header using any of the following ways, "X-CKAN-API-Key", "EDX-API-Key", or "Authorization".

.. attention:: 
   * Add the ``"User-Agent":`` parameter within the ``headers`` of the API request
   * Set  ``"User-Agent":`` to the value ``"EDX-USER"``




Example 1: Download using ``wget``
-----------------------------------

**Public Resources:**

To obtain a public URL needed to download a resource, you can navigate to the public submission (e.g. https://edx.netl.doe.gov/dataset/global-oil-gas-features-database), right click on the blue “Download” button and “Copy Link Address” (wording from Chrome). The copied URL from this example will look as follows (08/03/2022): https://edx.netl.doe.gov/dataset/g27625d9b-4a28-4bdf-bc5c-09834f7a9dfb/resource/34280f73-526f-497b-a672-9f37313acede/download

.. code-block:: bash

    wget --header="EDX-API-Key:<YOUR_EDX_API_KEY>" https://edx.netl.doe.gov/dataset/<submission name or id>/resource/<resource id>/download

IF you know your resource ID you can perform the download without the need of the submission name or id.

.. code-block:: bash
    
    wget --header="EDX-API-Key:<YOUR_EDX_API_KEY>" https://edx.netl.doe.gov/resource/<resource id>/download

**Private Resources:**

To obtain a private URL needed to download a private resource, there are two scenarios:

* Private Submission:

    Navigate to the private submission (e.g. https://edx.netl.doe.gov/dataset/databook-node-modules-folder-content), right click on the blue “Download” button and “Copy Link Address” (wording from Chrome). The copied URL from this example will look as follows (08/03/2022) : https://edx.netl.doe.gov/dataset/e807c09e-1a63-4963-9002-1e0c3f0f025d/resource/279eec86-fac8-4504-a3cb-c791bfc98dde/download

.. code-block:: bash

    wget --header="EDX-API-Key:<YOUR_EDX_API_KEY>" https://edx.netl.doe.gov/dataset/<submission name or id>/resource/<resource id>/download

* EDX Drive Resource:

    Navigate to the EDX Drive of a workspace, right click on a resource and select the “Copy Download Link” option (08/03/2022) (e.g. https://edx.netl.doe.gov/resource/55bf54eb-0e70-49e7-8f00-e7057f7dced8/download )

.. code-block:: bash
    
    wget --header="EDX-API-Key:<YOUR_EDX_API_KEY>" https://edx.netl.doe.gov/resource/<resource id>/download


.. attention:: 
   * Add the ``"User-Agent":`` parameter within the ``headers`` of the API request
   * Set  ``"User-Agent":`` to the value ``"EDX-USER"``

.. list-table:: 
   :header-rows: 1

   * - Parameter Name
     - Description
     - Required Fields
   * - ``resource_id``
     - ID of the resource found in the metadata
     - **Required**


Example 2: Download using Python 3.8
-------------------------------------

.. code-block::

    import requests
    import os


    headers = {
        "EDX-API-Key": 'YOUR-API-KEY-HERE',
        "User-Agent": 'EDX-USER',
    }

    params = {
        'resource_id': 'RESOURCE-ID-HERE',
    }

    url = 'https://edx.netl.doe.gov/api/3/resource_download'


    # Get filename from headers
    print("Sending request to resource data...")
    response_head = requests.head(url, headers=headers, params=params)
    if response_head.status_code != 200:
        print(f"Failed to get resource data. Status code: {response_head.status_code}")
        exit(1)

    content_disposition = response_head.headers.get('Content-Disposition')

    # Set the filename from the Content-Disposition header if available
    filename = None
    if content_disposition and 'filename=' in content_disposition:
        filename = content_disposition.split('filename=')[-1].strip('"')

    # Get the content length from headers and determine resource size.
    content_length = response_head.headers.get('Content-Length')
    resource_size = int(content_length) if content_length is not None else None

    print("Resource Name:", filename)
    print(f"Resource Size: {resource_size} bytes")

    # Determine if partial file exists
    existing_size = 0
    if os.path.exists(filename):
        existing_size = os.path.getsize(filename)
        print(f"File already exists. The current file size is: {existing_size} bytes.")

        if resource_size is not None:
            print(f"Resource file size: {resource_size} bytes")
            if existing_size >= resource_size:
                print("File already fully downloaded in current directory.")
                exit(0)

        headers['Range'] = f'bytes={existing_size}-'
        print(f"Resuming download from byte: {existing_size}")
    else:
        print(f"Starting download for: {filename}")

    # Begin download stream
    print(headers, url)
    response = requests.get(url, headers=headers, params=params, stream=True)

    print(f"Download response status code: {response.status_code}")
    if response.status_code in (200, 206):
        # If the server returns a 206 (for partial content), use 'ab' mode to append
        mode = 'ab' if response.status_code == 206 else 'wb'
        total_bytes = existing_size
        print(f"Saving to: {os.path.abspath(filename)}")
        with open(filename, mode) as f:
            for chunk in response.iter_content(chunk_size=8192):
                if chunk:
                    f.write(chunk)
                    total_bytes += len(chunk)
                    if resource_size:
                        percent = (total_bytes / resource_size) * 100
                        print(f"\rDownloaded: {total_bytes} bytes ({percent:.2f}%)", end='', flush=True)
                    else:
                        # If resource size is unknown, just show bytes downloaded
                        print(f"\rDownloaded: {total_bytes} bytes", end='', flush=True)

        print(f"\nDownload complete.")
        print(f"Total bytes downloaded: {total_bytes}")
    else:
        print(f"Download Failed. Status code: {response.status_code}")
        try:
            print("Response:", response.json())
        except Exception:
            print("Non-JSON response:", response.text)