Download Resource File(s)
Important
An API Key is required for ALL downloads as of EDX 3.07 (09/03/2020)
The API Key name component can be added to your header using any of the following ways, “X-CKAN-API-Key”, “EDX-API-Key”, or “Authorization”.
Attention
Add the
"User-Agent":
parameter within theheaders
of the API requestSet
"User-Agent":
to the value"EDX-USER"
Example 1: Download using wget
Public Resources:
To obtain a public URL needed to download a resource, you can navigate to the public submission (e.g. https://edx.netl.doe.gov/dataset/global-oil-gas-features-database), right click on the blue “Download” button and “Copy Link Address” (wording from Chrome). The copied URL from this example will look as follows (08/03/2022): https://edx.netl.doe.gov/dataset/g27625d9b-4a28-4bdf-bc5c-09834f7a9dfb/resource/34280f73-526f-497b-a672-9f37313acede/download
wget --header="EDX-API-Key:<YOUR_EDX_API_KEY>" https://edx.netl.doe.gov/dataset/<submission name or id>/resource/<resource id>/download
IF you know your resource ID you can perform the download without the need of the submission name or id.
wget --header="EDX-API-Key:<YOUR_EDX_API_KEY>" https://edx.netl.doe.gov/resource/<resource id>/download
Private Resources:
To obtain a private URL needed to download a private resource, there are two scenarios:
Private Submission:
Navigate to the private submission (e.g. https://edx.netl.doe.gov/dataset/databook-node-modules-folder-content), right click on the blue “Download” button and “Copy Link Address” (wording from Chrome). The copied URL from this example will look as follows (08/03/2022) : https://edx.netl.doe.gov/dataset/e807c09e-1a63-4963-9002-1e0c3f0f025d/resource/279eec86-fac8-4504-a3cb-c791bfc98dde/download
wget --header="EDX-API-Key:<YOUR_EDX_API_KEY>" https://edx.netl.doe.gov/dataset/<submission name or id>/resource/<resource id>/download
EDX Drive Resource:
Navigate to the EDX Drive of a workspace, right click on a resource and select the “Copy Download Link” option (08/03/2022) (e.g. https://edx.netl.doe.gov/resource/55bf54eb-0e70-49e7-8f00-e7057f7dced8/download )
wget --header="EDX-API-Key:<YOUR_EDX_API_KEY>" https://edx.netl.doe.gov/resource/<resource id>/download
Attention
Add the
"User-Agent":
parameter within theheaders
of the API requestSet
"User-Agent":
to the value"EDX-USER"
Parameter Name |
Description |
Required Fields |
---|---|---|
|
ID of the resource found in the metadata |
Required |
Example 2: Download using Python 3.8
import requests
import os
headers = {
"EDX-API-Key": 'YOUR-API-KEY-HERE',
"User-Agent": 'EDX-USER',
}
params = {
'resource_id': 'RESOURCE-ID-HERE',
}
url = 'https://edx.netl.doe.gov/api/3/resource_download'
# Get filename from headers
print("Sending request to resource data...")
response_head = requests.head(url, headers=headers, params=params)
if response_head.status_code != 200:
print(f"Failed to get resource data. Status code: {response_head.status_code}")
exit(1)
content_disposition = response_head.headers.get('Content-Disposition')
# Set the filename from the Content-Disposition header if available
filename = None
if content_disposition and 'filename=' in content_disposition:
filename = content_disposition.split('filename=')[-1].strip('"')
# Get the content length from headers and determine resource size.
content_length = response_head.headers.get('Content-Length')
resource_size = int(content_length) if content_length is not None else None
print("Resource Name:", filename)
print(f"Resource Size: {resource_size} bytes")
# Determine if partial file exists
existing_size = 0
if os.path.exists(filename):
existing_size = os.path.getsize(filename)
print(f"File already exists. The current file size is: {existing_size} bytes.")
if resource_size is not None:
print(f"Resource file size: {resource_size} bytes")
if existing_size >= resource_size:
print("File already fully downloaded in current directory.")
exit(0)
headers['Range'] = f'bytes={existing_size}-'
print(f"Resuming download from byte: {existing_size}")
else:
print(f"Starting download for: {filename}")
# Begin download stream
print(headers, url)
response = requests.get(url, headers=headers, params=params, stream=True)
print(f"Download response status code: {response.status_code}")
if response.status_code in (200, 206):
# If the server returns a 206 (for partial content), use 'ab' mode to append
mode = 'ab' if response.status_code == 206 else 'wb'
total_bytes = existing_size
print(f"Saving to: {os.path.abspath(filename)}")
with open(filename, mode) as f:
for chunk in response.iter_content(chunk_size=8192):
if chunk:
f.write(chunk)
total_bytes += len(chunk)
if resource_size:
percent = (total_bytes / resource_size) * 100
print(f"\rDownloaded: {total_bytes} bytes ({percent:.2f}%)", end='', flush=True)
else:
# If resource size is unknown, just show bytes downloaded
print(f"\rDownloaded: {total_bytes} bytes", end='', flush=True)
print(f"\nDownload complete.")
print(f"Total bytes downloaded: {total_bytes}")
else:
print(f"Download Failed. Status code: {response.status_code}")
try:
print("Response:", response.json())
except Exception:
print("Non-JSON response:", response.text)