Models
Data models extending sssom-schema for secondary-to-primary mappings.
Mapping Sets
- class Sec2PriMappingSet(*args, _if_missing: Callable[[JsonObj, str], Tuple[bool, Any]] = None, **kwargs)[source]
Bases:
MappingSetA MappingSet for Sec2Pri, with helpers for cardinality and export.
- _primary_ids
Private store for the full authoritative primary ID set. Kept private so sssom serialisers never include it in any output. Access it only through
to_pri_ids().
Initialise the mapping set and the private primary-IDs store.
- to_sssom(output_path: Path | str | None = None) sssom_document.MappingSetDocument[source]
Return an SSSOM
MappingSetDocument, optionally writing to TSV.- Parameters:
output_path – If given, the document is also serialised to an SSSOM TSV file at this path
- Returns:
sssom.sssom_document.MappingSetDocumentfor the mapping set.
- to_rdf(output_path: Path | str | None = None, serialisation: str = 'turtle') rdflib.Graph[source]
Return an RDFLib graph, optionally writing it to a file.
When output_path is given (or auto-generated via the
savedispatcher), the graph is also serialised to disk. Either way therdflib.Graphis returned so callers can query or manipulate it directly.- Parameters:
output_path – Destination path. Pass a path (or
Noneto auto-generate one) to persist the graph. If you only want the in-memory graph without touching the file-system, callto_rdf()with no arguments and ignore the path attribute.serialisation – RDFLib serialisation format (default:
"turtle").
- Returns:
rdflib.Graphcontaining all mappings as RDF triples.
- to_json(output_path: Path | str | None = None) dict[str, Any][source]
Return the mapping set as a JSON-compatible
dict, optionally writing to file.- Parameters:
output_path – If given, the JSON is also written to this path.
- Returns:
dictrepresentation of the mapping set in SSSOM JSON format.
- to_owl(output_path: Path | str | None = None, serialisation: str = 'turtle') rdflib.Graph[source]
Return an OWL
rdflib.Graph, optionally writing to file.- Parameters:
output_path – If given, the graph is also serialised to this path.
serialisation – RDFLib serialisation format (default:
"turtle").
- Returns:
rdflib.Graphcontaining OWL axioms for the mapping set.
- save(fmt: str, output_path: Path | str | None = None, **kwargs: object) Path[source]
Write to any supported format by name.
Shared formats:
"sssom","rdf","json","owl". Subclasses override this to add type-specific formats.- Parameters:
fmt – Format key (see above).
output_path – Destination path. Auto-generated if
None.**kwargs – Forwarded to the format-specific writer.
- Returns:
Path to the written file.
- Raises:
ValueError – For unknown format keys.
- class IdMappingSet(*args, _if_missing: Callable[[JsonObj, str], Tuple[bool, Any]] = None, **kwargs)[source]
Bases:
Sec2PriMappingSetMapping set for ID-based (secondary to primary identifier) mappings.
Initialise the mapping set and the private primary-IDs store.
- to_sec2pri(output_path: Path | str | None = None) pd.DataFrame[source]
Return a
DataFrameof secondary to primary ID mappings.Columns:
subject_id(secondary),object_id(primary),predicate_id,mapping_cardinality.- Parameters:
output_path – If given, the DataFrame is also written as a TSV file.
- Returns:
pandas.DataFramewith one row per mapping.
- to_pri_ids(output_path: Path | str | None = None) list[str][source]
Return a sorted list of unique primary IDs, optionally writing to TXT.
When
_primary_idsis populated (e.g. from the HGNC complete set) that set is used. Otherwise primary IDs are derived from the uniqueobject_idvalues in the mappings.- Parameters:
output_path – If given, the IDs are also written one-per-line to a text file.
- Returns:
Sorted list of unique primary ID strings.
- save(fmt: str, output_path: Path | str | None = None, **kwargs: object) Path[source]
Write to any supported format by name.
Formats:
"sssom","rdf","json","owl","sec2pri","pri_ids".- Parameters:
fmt – Format key (see above).
output_path – Destination path. Auto-generated if
None.**kwargs – Forwarded to the format-specific writer.
- Returns:
Path to the written file.
- Raises:
ValueError – For unknown format keys.
- class LabelMappingSet(*args, _if_missing: Callable[[JsonObj, str], Tuple[bool, Any]] = None, **kwargs)[source]
Bases:
Sec2PriMappingSetMapping set for label-based (previous/alias symbol to current symbol) mappings.
Initialise the mapping set and the private primary-IDs store.
- to_symbol_sec2pri(output_path: Path | str | None = None) pd.DataFrame[source]
Return a
DataFrameof previous/alias symbol to current symbol mappings.Columns:
subject_id,subject_label(secondary/previous symbol),object_id,object_label(primary/current symbol),predicate_id,mapping_cardinality.- Parameters:
output_path – If given, the DataFrame is also written as a TSV file.
- Returns:
pandas.DataFramewith one row per symbol mapping.
- to_pri_symbols(output_path: Path | str | None = None) list[str][source]
Return a sorted list of unique current/primary symbols, optionally writing to TXT.
Derived from the unique
object_labelvalues in the mappings.- Parameters:
output_path – If given, the symbols are also written one-per-line to a text file.
- Returns:
Sorted list of unique primary symbol strings.
- to_name2synonym(output_path: Path | str | None = None) pd.DataFrame[source]
Return a name to synonym
DataFrame, optionally writing to TSV.Columns:
subject_id,subject_label(primary name),object_label(synonym/previous name).- Parameters:
output_path – If given, the DataFrame is also written as a TSV file.
- Returns:
pandas.DataFramewith label mapping rows.
- save(fmt: str, output_path: Path | str | None = None, **kwargs: object) Path[source]
Write to any supported format by name.
Formats:
"sssom","rdf","json","owl","symbol_sec2pri"("symbol2prev"is a deprecated alias),"pri_symbols","name2synonym".- Parameters:
fmt – Format key (see above).
output_path – Destination path. Auto-generated if
None.**kwargs – Forwarded to the format-specific writer.
- Returns:
Path to the written file.
- Raises:
ValueError – For unknown format keys.
Configuration
- class DatasourceConfig(name: str, prefix: str, curie_base_url: str, default_output_filename: str = '', available_outputs: list[str] = <factory>, download_urls: dict[str, ~typing.Any] = <factory>, primary_file_key: str = '', id_pattern: str = '', archive_url: str = '', input_file_types: list[str] = <factory>, source: str = '', homepage: str = '', data_license: str = '', sparql_endpoint: str = '', queries: dict[str, str] = <factory>, new_format_version: int | None = None, mappingset_metadata: dict[str, ~typing.Any] = <factory>, mapping_metadata: dict[str, ~typing.Any] = <factory>)[source]
Configuration for a biological database datasource loaded from YAML.
Constants
Pre-loaded datasource configurations.
Constants for supported datasources.
Base Parser
- class BaseParser(version: str | None = None, show_progress: bool = True, config_name: str | None = None)[source]
Abstract base class for all datasource parsers.
Each parser is responsible for reading files from a specific datasource and extracting secondary-to-primary identifier Mapping Sets.
Initialize the parser.
- Parameters:
version – Version/release identifier for the datasource.
show_progress – Whether to show progress bars during parsing.
config_name – Name of config file to load (defaults to class name).
- property config: DatasourceConfig | None
Get the loaded configuration.
- apply_metadata_to_mappingset(mappingset: MappingSet, metadata: dict[str, Any]) None[source]
Apply metadata to a MappingSet and its Mappings.
- abstractmethod parse(input_path: Path | str | None) MappingSet[source]
Parse the input file(s) and return a MappingSet.
- Parameters:
input_path – Path to the input file or directory.
- Returns:
A MappingSet containing all extracted mappings.
- static normalize_withdrawn_id(subject_id: str | None) str[source]
Normalize a primary ID, converting empty/null to withdrawn.
- Parameters:
subject_id – The raw primary identifier from the source file.
- Returns:
The normalized primary ID, or WITHDRAWN_ENTRY for empty values.
- static is_withdrawn_primary(subject_id: str) bool[source]
Check if a primary ID represents a withdrawn/deleted entry.
- Parameters:
subject_id – The primary identifier to check.
- Returns:
True if the primary ID indicates a withdrawn entry.
- create_mapping_set(mappings: list[Mapping], mapping_type: str = 'id') Sec2PriMappingSet[source]
Create an IdMappingSet or LabelMappingSet with config metadata.
Common factory method for creating mapping sets with all SSSOM metadata populated from the YAML config. It also computes cardinalities for mappings.
- Parameters:
mappings – List of SSSOM Mapping objects.
mapping_type – “id” for IdMappingSet (cardinality by ID), “label” for LabelMappingSet (cardinality by label).
- Returns:
MappingSet with computed cardinalities.