Download

Utilities for downloading database source files and detecting releases.

Download and release detection for biological database sources.

exception CloudflareBlockedError(url: str)[source]

Bases: Exception

Raised when a download is blocked by Cloudflare bot protection.

Parameters:

url – The URL that was blocked.

Docstring for __init__.

Parameters:

url (str) – URL requested

class ReleaseInfo(datasource: str, version: str | None, release_date: datetime | None, is_new: bool, files: dict[str, str])[source]

Bases: object

Information about a datasource release.

check_release(datasource: str, current_version: str | None = None, current_date: datetime | None = None) ReleaseInfo[source]

Check if a new release is available for a datasource.

Parameters:
  • datasource – Name of the datasource.

  • current_version – Current version string to compare against.

  • current_date – Current release date to compare against.

Returns:

ReleaseInfo with is_new indicating if update is available.

download_datasource(datasource: str, output_dir: Path, decompress: bool = True, version: str | None = None, subset: str = '3star', keys: list[str] | None = None) dict[str, Path][source]

Download all files for a datasource.

For datasources with dynamic URLs (like HGNC quarterly archive), this function first checks for the latest release and uses those URLs. If a version is specified, it downloads that specific version.

Parameters:
  • datasource – Name of the datasource.

  • output_dir – Directory to save files.

  • decompress – Whether to decompress .gz files.

  • version – Specific version to download. Format depends on datasource.

  • subset – For ChEBI: “3star” or “complete” (for releases using SDF only).

  • keys – Optional list of file-key names to download. When given, only URLs whose key is in this list are fetched. Defaults to all keys.

Returns:

Dictionary mapping file keys to downloaded paths.

download_file(url: str, output_path: Path, decompress_gz: bool = True, timeout: float | None = None, show_progress: bool = True, description: str | None = None) Path[source]

Download a file from URL to the specified path.

Parameters:
  • url – URL to download from.

  • output_path – Where to save the file.

  • decompress_gz – Whether to decompress .gz files automatically.

  • timeout – Request timeout in seconds.

  • show_progress – Whether to show a progress bar.

  • description – Description for the progress bar.

Returns:

Path to the downloaded (and optionally decompressed) file.

get_download_urls(datasource: str, version: str | None = None, subset: str = '3star') dict[str, str][source]

Get download URLs for a datasource.

Parameters:
  • datasource – Name of the datasource.

  • version – Specific version to get URLs for.

  • subset – For ChEBI: “3star” or “complete” (only affects legacy SDF).

Returns:

Dictionary mapping file keys to URLs.

get_latest_release_info(datasource: str) ReleaseInfo[source]

Get release information for a datasource.

Parameters:

datasource – Name of the datasource (chebi, hmdb, hgnc, ncbi, uniprot).

Returns:

ReleaseInfo with the latest release details.

Raises:

ValueError – If the datasource is not supported.