Download
Utilities for downloading database source files and detecting releases.
Download and release detection for biological database sources.
- exception CloudflareBlockedError(url: str)[source]
Bases:
ExceptionRaised when a download is blocked by Cloudflare bot protection.
- Parameters:
url – The URL that was blocked.
Docstring for __init__.
- Parameters:
url (str) – URL requested
- class ReleaseInfo(datasource: str, version: str | None, release_date: datetime | None, is_new: bool, files: dict[str, str])[source]
Bases:
objectInformation about a datasource release.
- check_release(datasource: str, current_version: str | None = None, current_date: datetime | None = None) ReleaseInfo[source]
Check if a new release is available for a datasource.
- Parameters:
datasource – Name of the datasource.
current_version – Current version string to compare against.
current_date – Current release date to compare against.
- Returns:
ReleaseInfo with is_new indicating if update is available.
- download_datasource(datasource: str, output_dir: Path, decompress: bool = True, version: str | None = None, subset: str = '3star', keys: list[str] | None = None) dict[str, Path][source]
Download all files for a datasource.
For datasources with dynamic URLs (like HGNC quarterly archive), this function first checks for the latest release and uses those URLs. If a version is specified, it downloads that specific version.
- Parameters:
datasource – Name of the datasource.
output_dir – Directory to save files.
decompress – Whether to decompress .gz files.
version – Specific version to download. Format depends on datasource.
subset – For ChEBI: “3star” or “complete” (for releases using SDF only).
keys – Optional list of file-key names to download. When given, only URLs whose key is in this list are fetched. Defaults to all keys.
- Returns:
Dictionary mapping file keys to downloaded paths.
- download_file(url: str, output_path: Path, decompress_gz: bool = True, timeout: float | None = None, show_progress: bool = True, description: str | None = None) Path[source]
Download a file from URL to the specified path.
- Parameters:
url – URL to download from.
output_path – Where to save the file.
decompress_gz – Whether to decompress .gz files automatically.
timeout – Request timeout in seconds.
show_progress – Whether to show a progress bar.
description – Description for the progress bar.
- Returns:
Path to the downloaded (and optionally decompressed) file.
- get_download_urls(datasource: str, version: str | None = None, subset: str = '3star') dict[str, str][source]
Get download URLs for a datasource.
- Parameters:
datasource – Name of the datasource.
version – Specific version to get URLs for.
subset – For ChEBI: “3star” or “complete” (only affects legacy SDF).
- Returns:
Dictionary mapping file keys to URLs.
- get_latest_release_info(datasource: str) ReleaseInfo[source]
Get release information for a datasource.
- Parameters:
datasource – Name of the datasource (chebi, hmdb, hgnc, ncbi, uniprot).
- Returns:
ReleaseInfo with the latest release details.
- Raises:
ValueError – If the datasource is not supported.