Download Resource File(s)

Important

An API Key is required for ALL downloads as of EDX 3.07 (09/03/2020)

The API Key name component can be added to your header using any of the following ways, “X-CKAN-API-Key”, “EDX-API-Key”, or “Authorization”.

Attention

  • Add the "User-Agent": parameter within the headers of the API request

  • Set "User-Agent": to the value "EDX-USER"

Example 1: Download using wget

Public Resources:

To obtain a public URL needed to download a resource, you can navigate to the public submission (e.g. https://edx.netl.doe.gov/dataset/global-oil-gas-features-database), right click on the blue “Download” button and “Copy Link Address” (wording from Chrome). The copied URL from this example will look as follows (08/03/2022): https://edx.netl.doe.gov/dataset/g27625d9b-4a28-4bdf-bc5c-09834f7a9dfb/resource/34280f73-526f-497b-a672-9f37313acede/download

wget --header="EDX-API-Key:<YOUR_EDX_API_KEY>" https://edx.netl.doe.gov/dataset/<submission name or id>/resource/<resource id>/download

IF you know your resource ID you can perform the download without the need of the submission name or id.

wget --header="EDX-API-Key:<YOUR_EDX_API_KEY>" https://edx.netl.doe.gov/resource/<resource id>/download

Private Resources:

To obtain a private URL needed to download a private resource, there are two scenarios:

wget --header="EDX-API-Key:<YOUR_EDX_API_KEY>" https://edx.netl.doe.gov/dataset/<submission name or id>/resource/<resource id>/download
wget --header="EDX-API-Key:<YOUR_EDX_API_KEY>" https://edx.netl.doe.gov/resource/<resource id>/download

Attention

  • Add the "User-Agent": parameter within the headers of the API request

  • Set "User-Agent": to the value "EDX-USER"

Parameter Name

Description

Required Fields

resource_id

ID of the resource found in the metadata

Required

Example 2: Download using Python 3.8

import requests
import os


headers = {
    "EDX-API-Key": 'YOUR-API-KEY-HERE',
    "User-Agent": 'EDX-USER',
}

params = {
    'resource_id': 'RESOURCE-ID-HERE',
}

url = 'https://edx.netl.doe.gov/api/3/resource_download'


# Get filename from headers
print("Sending request to resource data...")
response_head = requests.head(url, headers=headers, params=params)
if response_head.status_code != 200:
    print(f"Failed to get resource data. Status code: {response_head.status_code}")
    exit(1)

content_disposition = response_head.headers.get('Content-Disposition')

# Set the filename from the Content-Disposition header if available
filename = None
if content_disposition and 'filename=' in content_disposition:
    filename = content_disposition.split('filename=')[-1].strip('"')

# Get the content length from headers and determine resource size.
content_length = response_head.headers.get('Content-Length')
resource_size = int(content_length) if content_length is not None else None

print("Resource Name:", filename)
print(f"Resource Size: {resource_size} bytes")

# Determine if partial file exists
existing_size = 0
if os.path.exists(filename):
    existing_size = os.path.getsize(filename)
    print(f"File already exists. The current file size is: {existing_size} bytes.")

    if resource_size is not None:
        print(f"Resource file size: {resource_size} bytes")
        if existing_size >= resource_size:
            print("File already fully downloaded in current directory.")
            exit(0)

    headers['Range'] = f'bytes={existing_size}-'
    print(f"Resuming download from byte: {existing_size}")
else:
    print(f"Starting download for: {filename}")

# Begin download stream
print(headers, url)
response = requests.get(url, headers=headers, params=params, stream=True)

print(f"Download response status code: {response.status_code}")
if response.status_code in (200, 206):
    # If the server returns a 206 (for partial content), use 'ab' mode to append
    mode = 'ab' if response.status_code == 206 else 'wb'
    total_bytes = existing_size
    print(f"Saving to: {os.path.abspath(filename)}")
    with open(filename, mode) as f:
        for chunk in response.iter_content(chunk_size=8192):
            if chunk:
                f.write(chunk)
                total_bytes += len(chunk)
                if resource_size:
                    percent = (total_bytes / resource_size) * 100
                    print(f"\rDownloaded: {total_bytes} bytes ({percent:.2f}%)", end='', flush=True)
                else:
                    # If resource size is unknown, just show bytes downloaded
                    print(f"\rDownloaded: {total_bytes} bytes", end='', flush=True)

    print(f"\nDownload complete.")
    print(f"Total bytes downloaded: {total_bytes}")
else:
    print(f"Download Failed. Status code: {response.status_code}")
    try:
        print("Response:", response.json())
    except Exception:
        print("Non-JSON response:", response.text)