Download
Utilities for downloading database source files and detecting releases.
Download and release detection for biological database sources.
- exception CloudflareBlockedError(url: str)[source]
Bases:
ExceptionRaised when a download is blocked by Cloudflare bot protection.
- Parameters:
url – The URL that was blocked.
Docstring for __init__.
- Parameters:
url (str) – URL requested
- class ReleaseInfo(datasource: str, version: str | None, release_date: datetime | None, is_new: bool, files: dict[str, str])[source]
Bases:
objectInformation about a datasource release.
- check_release(datasource: str, current_version: str | None = None, current_date: datetime | None = None) ReleaseInfo[source]
Check if a new release is available for a datasource.
- Parameters:
datasource – Name of the datasource.
current_version – Current version string to compare against.
current_date – Current release date to compare against.
- Returns:
ReleaseInfo with is_new indicating if update is available.
- download_datasource(datasource: str, output_dir: Path, decompress: bool = True, version: str | None = None, subset: str = '3star', keys: list[str] | None = None) dict[str, Path][source]
Download all files for a datasource.
For datasources with dynamic URLs (like HGNC quarterly archive), this function first checks for the latest release and uses those URLs. If a version is specified, it downloads that specific version.
- Parameters:
datasource – Name of the datasource.
output_dir – Directory to save files.
decompress – Whether to decompress .gz files.
version – Specific version to download. Format depends on datasource.
subset – For ChEBI: “3star” or “complete” (for releases using SDF only).
keys – Optional list of file-key names to download. When given, only URLs whose key is in this list are fetched. Defaults to all keys.
- Returns:
Dictionary mapping file keys to downloaded paths.
- download_file(url: str, output_path: Path, decompress_gz: bool = True, timeout: float | None = None, show_progress: bool = True, description: str | None = None) Path[source]
Download a file from URL to the specified path.
- Parameters:
url – URL to download from.
output_path – Where to save the file.
decompress_gz – Whether to decompress .gz files automatically.
timeout – Request timeout in seconds.
show_progress – Whether to show a progress bar.
description – Description for the progress bar.
- Returns:
Path to the downloaded (and optionally decompressed) file.
- get_download_urls(datasource: str, version: str | None = None, subset: str = '3star') dict[str, str][source]
Get download URLs for a datasource.
- Parameters:
datasource – Name of the datasource.
version – Specific version to get URLs for.
subset – For ChEBI: “3star” or “complete” (only affects legacy SDF).
- Returns:
Dictionary mapping file keys to URLs.
- get_latest_release_info(datasource: str) ReleaseInfo[source]
Get release information for a datasource.
- Parameters:
datasource – Name of the datasource (chebi, hmdb, hgnc, ncbi, uniprot).
- Returns:
ReleaseInfo with the latest release details.
- Raises:
ValueError – If the datasource is not supported.
- list_versions(datasource: str) Any[source]
List all available archive versions for a datasource.
Delegates to the datasource’s
BaseDownloadersubclasslist_versions()method, which contains all source-specific retrieval logic.For datasources that publish versioned archives (ChEBI, HGNC, UniProt), returns all available version strings sorted in ascending order.
NCBI and HMDB do not maintain versioned archives; calling this function for those datasources raises
ValueError.- Parameters:
datasource – Datasource name (
"chebi","hgnc", or"uniprot").- Returns:
chebi: integer release numbers, e.g.
["200", "201", ..., "245"]hgnc: ISO dates, e.g.
["2023-01-01", ..., "2026-04-07"]uniprot: release identifiers, e.g.
["2024_01", "2024_02", ...]
- Return type:
Sorted list of version strings. Format depends on the datasource
- Raises:
ValueError – If the datasource is unknown or has no versioned archive.