Command Line Interface

The pysec2pri CLI provides easy commands for each supported database.

pysec2pri

pysec2pri: Secondary to Primary ID mapping tool.

Usage

pysec2pri [OPTIONS] COMMAND [ARGS]...

Options

--version

Show the version and exit.

all

Export all formats for specified datasources.

Usage

pysec2pri all [OPTIONS]

Options

-o, --output-dir <output_dir>

Output directory

--datasources <datasources>

Comma-separated list of datasources

chebi

Parse ChEBI data and generate mappings.

Usage

pysec2pri chebi [OPTIONS] COMMAND [ARGS]...

ids

Generate ChEBI ID mappings (secondary to primary ChEBI IDs).

Formats: sec2pri (dict), pri_ids (set of current IDs), sssom/rdf/json/owl/all.

Usage

pysec2pri chebi ids [OPTIONS] [INPUT_FILE]

Options

-o, --output <output>

Output file or directory

--version <data_version>

Datasource release version.

--subset <subset>

Compound subset.

Default:

'3star'

Options:

3star | complete

--format <output_format>
Default:

'sssom'

Options:

sssom | sec2pri | pri_ids | rdf | json | owl | all

Arguments

INPUT_FILE

Optional argument

synonyms

Generate ChEBI synonym mappings (previous/alias name to current name).

Formats: symbol_sec2pri (dict), name2synonym, pri_symbols (set of current names), sssom/rdf/json/owl/all.

Usage

pysec2pri chebi synonyms [OPTIONS] [INPUT_FILE]

Options

-o, --output <output>

Output file or directory

--version <data_version>

Datasource release version.

--subset <subset>

Compound subset.

Default:

'3star'

Options:

3star | complete

--format <output_format>
Default:

'sssom'

Options:

sssom | symbol_sec2pri | name2synonym | pri_symbols | rdf | json | owl | all

Arguments

INPUT_FILE

Optional argument

diff

Compare two SSSOM mapping files and show differences.

Usage

pysec2pri diff [OPTIONS] FILE1 FILE2

Options

-o, --output <output>

Output file for diff results (TSV)

--show-all

Show all differences

--datasource <datasource>

Datasource name for diff summary

Arguments

FILE1

Required argument

FILE2

Required argument

hgnc

Parse HGNC files and generate mappings.

Usage

pysec2pri hgnc [OPTIONS] COMMAND [ARGS]...

ids

Generate HGNC ID mappings (secondary to primary HGNC IDs).

Formats: sec2pri (dict), pri_ids (set of all current IDs), sssom/rdf/json/owl/all.

Usage

pysec2pri hgnc ids [OPTIONS] [INPUT_FILE]

Options

-o, --output <output>

Output file or directory

--version <data_version>

Datasource release version.

--format <output_format>
Default:

'sssom'

Options:

sssom | sec2pri | pri_ids | rdf | json | owl | all

Arguments

INPUT_FILE

Optional argument

symbols

Generate HGNC symbol mappings (previous/alias to current symbol).

Formats: symbol_sec2pri (dict), pri_symbols (set of current symbols), sssom/rdf/json/owl/all.

Usage

pysec2pri hgnc symbols [OPTIONS] [COMPLETE_SET_FILE]

Options

-o, --output <output>

Output file or directory

--version <data_version>

Datasource release version.

--format <output_format>
Default:

'sssom'

Options:

sssom | symbol_sec2pri | pri_symbols | rdf | json | owl | all

Arguments

COMPLETE_SET_FILE

Optional argument

hmdb

Parse HMDB XML files and generate secondary-to-primary mappings.

Usage

pysec2pri hmdb [OPTIONS]

Options

--metabolites-file <metabolites_file>
--proteins-file <proteins_file>
--metabolites-only
--proteins-only
-o, --output <output>

Output file or directory

--version <data_version>

Datasource release version.

--format <output_format>
Default:

'sssom'

Options:

sssom | sec2pri | pri_ids | rdf | json | owl | all

ncbi

Parse NCBI Gene files and generate mappings.

Usage

pysec2pri ncbi [OPTIONS] COMMAND [ARGS]...

ids

Generate NCBI Gene ID mappings (discontinued to current Gene IDs).

Formats: sec2pri (dict), pri_ids (set of all current IDs), sssom/rdf/json/owl/all.

Usage

pysec2pri ncbi ids [OPTIONS] [INPUT_FILE]

Options

-o, --output <output>

Output file or directory

--tax-id <tax_id>

Taxonomy ID.

Default:

'9606'

--version <data_version>

Datasource release version.

--format <output_format>
Default:

'sssom'

Options:

sssom | sec2pri | pri_ids | rdf | json | owl | all

Arguments

INPUT_FILE

Optional argument

symbols

Generate NCBI Gene symbol mappings (previous to current gene symbols).

Formats: symbol_sec2pri (dict), pri_symbols (set of current symbols), sssom/rdf/json/owl/all.

Usage

pysec2pri ncbi symbols [OPTIONS] [INPUT_FILE]

Options

-o, --output <output>

Output file or directory

--tax-id <tax_id>

Taxonomy ID.

Default:

'9606'

--version <data_version>

Datasource release version.

--format <output_format>
Default:

'sssom'

Options:

sssom | symbol_sec2pri | pri_symbols | rdf | json | owl | all

Arguments

INPUT_FILE

Optional argument

uniprot

Parse UniProt secondary accessions and generate mappings.

Usage

pysec2pri uniprot [OPTIONS] [INPUT_FILE]

Options

-o, --output <output>

Output file or directory

--version <data_version>

Datasource release version.

--format <output_format>
Default:

'sssom'

Options:

sssom | sec2pri | pri_ids | rdf | json | owl | all

--delac-file <delac_file>

Arguments

INPUT_FILE

Optional argument

update-ids

Resolve secondary IDs in INPUT_FILE to primary IDs using DATASOURCE mappings.

For each column specified with –at, a new column <col><suffix> is added to the output containing the resolved primary identifiers. Identifiers not found in the mapping are kept unchanged.

Pass –mapping to skip downloading/regenerating the mapping set and use an existing sec2pri TSV file instead.

Example:

pysec2pri update-ids my_genes.tsv hgnc --at gene_id -o my_genes_updated.tsv
pysec2pri update-ids my_genes.tsv hgnc --at gene_id --mapping hgnc_sec2pri.tsv

Usage

pysec2pri update-ids [OPTIONS] INPUT_FILE
                     {chebi|hgnc|hmdb|ncbi|uniprot|wikidata}

Options

--at <COLUMN>

Required Column name(s) containing identifiers to resolve. Repeat for multiple columns.

-o, --output <output_path>

Output file path (TSV or CSV).

--suffix <suffix>

Suffix for new columns.

Default:

'_primary'

--sep <sep>

Delimiter (inferred from extension if omitted).

--mapping <mapping_file>

Pre-built sec2pri TSV mapping file to use instead of regenerating.

--version <data_version>

Datasource release version.

--no-progress

Suppress progress bars.

Arguments

INPUT_FILE

Required argument

DATASOURCE

Required argument

update-symbols

Resolve previous/alias labels in INPUT_FILE to current labels using DATASOURCE.

For each column specified with –at, a new column <col><suffix> is added containing the resolved current labels. Labels not found in the mapping are kept unchanged.

Pass –mapping to skip downloading/regenerating the mapping set and use an existing symbol2prev TSV file instead.

Example:

pysec2pri update-symbols my_genes.tsv hgnc --at symbol -o my_genes_updated.tsv
pysec2pri update-symbols my_genes.tsv hgnc --at symbol --mapping hgnc_symbol2prev.tsv

Usage

pysec2pri update-symbols [OPTIONS] INPUT_FILE {chebi|hgnc|ncbi|wikidata}

Options

--at <COLUMN>

Required Column name(s) containing symbols to resolve. Repeat for multiple columns.

-o, --output <output_path>

Output file path (TSV or CSV).

--suffix <suffix>

Suffix for new columns.

Default:

'_current'

--sep <sep>

Delimiter (inferred from extension if omitted).

--mapping <mapping_file>

Pre-built symbol2prev TSV mapping file to use instead of regenerating.

--tax-id <tax_id>

Taxonomy ID.

Default:

'9606'

--entity-type <entity_type>

Entity type to query. Queries all if omitted.

Options:

metabolites | chemicals | genes | proteins

--subset <subset>

Compound subset.

Default:

'3star'

Options:

3star | complete

--version <data_version>

Datasource release version.

--no-progress

Suppress progress bars.

Arguments

INPUT_FILE

Required argument

DATASOURCE

Required argument

wikidata

Query Wikidata SPARQL for redirect mappings.

Usage

pysec2pri wikidata [OPTIONS] COMMAND [ARGS]...

ids

Generate Wikidata ID mappings (redirected to current Wikidata QIDs).

Formats: sec2pri (dict), pri_ids (set of current QIDs), sssom/rdf/json/owl/all.

Usage

pysec2pri wikidata ids [OPTIONS] [INPUT_FILE]

Options

-o, --output <output>

Output file or directory

--format <output_format>
Default:

'sssom'

Options:

sssom | sec2pri | pri_ids | rdf | json | owl | all

--entity-type <entity_type>

Entity type to query. Queries all if omitted.

Options:

metabolites | chemicals | genes | proteins

--test-subset

Use test queries (LIMIT 10)

Arguments

INPUT_FILE

Optional argument

symbols

Generate Wikidata label mappings (previous label to current label).

Formats: symbol_sec2pri (dict), pri_symbols (set of current labels), sssom/rdf/json/owl/all.

Usage

pysec2pri wikidata symbols [OPTIONS] [INPUT_FILE]

Options

-o, --output <output>

Output file or directory

--format <output_format>
Default:

'sssom'

Options:

sssom | symbol_sec2pri | pri_symbols | rdf | json | owl | all

--entity-type <entity_type>

Entity type to query. Queries all if omitted.

Options:

metabolites | chemicals | genes | proteins

--test-subset

Use test queries (LIMIT 10)

Arguments

INPUT_FILE

Optional argument