SSSOM-CLI examples
This page is intended to illustrate some use cases for the sssom-cli
command-line tool.
- 1. Validating sets
- 2. File conversions
- 3. Merging sets
- 4. Reconciliating IRI prefixes
- 5. Filtering mapping records
- 6. Editing mapping records
1. Validating sets
SSSOM-CLI automatically validates any mapping set it reads, so if you need to validate a SSSOM file, all that is needed is to give it as input to the tool:
$ sssom-cli my-set.sssom.tsv
If the file is valid, the command will print out the set and terminate with an exit code of zero. Conversely, if the set is not valid, the command will print an appropriate error message on the standard error stream and terminate with a non-zero exit code.
If you are only interested in knowing whether the set is valid, you might want to use the --no-stdout
option to disable printing the set on the standard output stream:
$ sssom-cli --no-stdout my-set.sssom.tsv
If several input sets are specified, each set is validated before the next set is read, and the command terminates as soon as one set is found to be invalid.
Use the --lax
option to relax the validation rules and force the command to silently accept some sets that would normally be rejected.
2. File conversions
2.1. Conversion between formats
SSSOM-CLI can read a SSSOM file in any of the following formats: TSV, CSV, JSON, and RDF/Turtle. It can likewise write a file in any of those formats.
If the recommended file extensions (.sssom.tsv
, .sssom.csv
, .sssom.json
, or .ttl
) are used, then converting from one format to another is simply a matter of using the --output
option. For example, to convert from TSV to JSON:
$ sssom-cli my-set.sssom.tsv --output my-set.sssom.json
Use the --output-format
to explicitly specify the format of the output file, if that file uses another extension than the one recommended for the intended format:
$ sssom-cli my-set.sssom.tsv --outout my-set.dat --output-format json
Note that conversion to OWL is not supported by SSSOM-CLI. Instead, it is handled by the ROBOT plugin and its sssom:inject
command:
$ robot sssom:inject --sssom my-set.sssom.tsv \ --create --direct \ --output my-set.ofn
2.2. Conversion to the latest version of the specification
SSSOM-CLI will always write the output set so that it is compliant with the latest version of the specification. Therefore, if you have a set compliant with some prior version, simply running it through SSSOM-CLI once will automatically convert it to the latest version:
$ sssom-cli my-old-set.sssom.tsv --output my-new-set.sssom.tsv
2.3. Conversion between embedded and external metadata
By default, when the output format is SSSOM/TSV, the mapping set metadata are written in embedded mode.
Assuming the file my-set.sssom.tsv
does not contain any mapping set metadata, but that said metadata are instead contained in a file named my-set.sssom.yml
, then with the following command:
$ sssom-cli my-set.sssom.tsv --output embedded.sssom.tsv
the newly created embedded.sssom.tsv
file will be an embedded version of the original file.
To perform the opposite operation (converting a file that contains embedded metadata into two files, one containing the TSV section and one containing the metadata), use the --metadata-output
option:
$ sssom-cli embedded.sssom.tsv --output myset.sssom.tsv \ --metadata-output myset.sssom.yml
2.4. Conversion between “condensed” and “propagated” forms
By defaut, SSSOM-CLI always write the output set in “condensed” form whenever possible, following the rules set forth in the SSSOM specification.
That is, when all mapping records in a set have the same value for a slot that is marked as “propagatable” (for example, mapping_tool
or subject_source
), that value is moved into the set-level metadata, so that it is written only once. This is a process called “condensation”.
Condensation normally never results in loss of information, since SSSOM consumers should automatically perform the reverse operation (“propagation”) whenever they read a SSSOM set that is in condensed form.
However, there may be cases where you’d want to convert a set in condensed form into its equivalent non-condensed (or “propagated”) form – for example, if the set is to be used by a tool that does not support propagation. This can be done with the --no-condensation
option:
$ sssom-cli condensed.sssom.tsv --no-condensation --output non-condensed.sssom.tsv
3. Merging sets
If SSSOM-CLI is provided with several input sets, they are all merged into a single set:
$ sssom-cli my-first-set.sssom.tsv my-second-set.sssom.tsv \ --output my-merged-set.sssom.tsv
The input sets may not necessarily be in the same format:
$ sssom-cli my-first-set.sssom.json my-second-set.ttl \ --output my-merged-set.sssom.tsv
4. Reconciliating IRI prefixes
You can use an Extended Prefix Map (EPM) to tweak the prefix map embedded within a SSSOM file and ensure that all declared prefix names are expanded to a “preferred” or “canonical” IRI prefix.
For example, suppose you have the following set:
#curie_map: # FBbt: "http://flybase.org/cgi-bin/fbcvq.html?query=FBbt:" # UBERON: http://purl.obolibrary.org/obo/UBERON_ #mapping_set_id: https://example.org/sets/fbbt-uncanonical-iris #mapping_set_title: FBbt to Uberon mappings with non-canonical FBbt IRIs #license: https://creativecommons.org/licenses/by/4.0/ subject_id predicate_id object_id mapping_justification FBbt:00000001 semapv:crossSpeciesExactMatch UBERON:0000468 semapv:ManualMappingCuration FBbt:00000002 semapv:crossSpeciesExactMatch UBERON:6000002 semapv:ManualMappingCuration FBbt:00000003 semapv:crossSpeciesExactMatch UBERON:0000914 semapv:ManualMappingCuration FBbt:00000004 semapv:crossSpeciesExactMatch UBERON:0000033 semapv:ManualMappingCuration FBbt:00000004 semapv:crossSpeciesExactMatch UBERON:6000004 semapv:ManualMappingCuration FBbt:00000005 semapv:crossSpeciesExactMatch UBERON:6000005 semapv:ManualMappingCuration FBbt:00000006 semapv:crossSpeciesExactMatch UBERON:6000006 semapv:ManualMappingCuration FBbt:00000007 semapv:crossSpeciesExactMatch UBERON:6000007 semapv:ManualMappingCuration
and that you’d rather have the FBbt:
prefix name expanded to the http://purl.obolibrary.org/obo/FBbt_
prefix.
If you have an Extended Prefix Map that (1) associates the FBbt:
prefix name to the “canonical” prefix http://purl.obolibrary.org/obo/FBbt_
and (2) declares http://flybase.org/cgi-bin/fbcvq.html?query=FBbt:
to be a “synonym” for that prefix, as in:
[ { "pattern": "^\\d{8}$", "prefix": "FBbt", "uri_prefix": "http://purl.obolibrary.org/obo/FBbt_", "uri_prefix_synonyms": [ "http://bio2rdf.org/fbbt:", "http://bioregistry.io/FBbt:", "http://bioregistry.io/FBbt_root:", "http://bioregistry.io/fbbt:", "http://flybase.org/cgi-bin/fbcvq.html?query=FBbt:", "http://www.ebi.ac.uk/ols/ontologies/fbbt/terms?iri=http://purl.obolibrary.org/obo/FBbt_", "https://bio2rdf.org/fbbt:", "https://bioregistry.io/FBbt:", "https://bioregistry.io/FBbt_root:", "https://bioregistry.io/fbbt:", "https://flybase.org/cgi-bin/fbcvq.html?query=FBbt:", "https://purl.obolibrary.org/obo/FBbt_", "https://www.ebi.ac.uk/ols/ontologies/fbbt/terms?iri=http://purl.obolibrary.org/obo/FBbt_" ] } }
then you can use SSSOM-CLI’s --epm
option to apply the EPM to the mapping set and and make sure that the prefix map always use only the canonical prefix for any prefix name:
$ sssom-cli uncanonical-iris.sssom.tsv \ --epm my-extended-prefix-map.json \ --output canonical-iris.sssom.tsv
(Of course, in this particularly simplistic example, using an extended prefix map might seem a overly complicated solution, since it would be quite easy to just “manually” edit the incorrect prefix declaration in the set’s own prefix map. But an extended prefix map would come in handy if you don’t know beforehand which prefix declarations need to be corrected, if any: you can just apply a blanket EPM covering all the prefixes you care about – after that, you can be sure that your mapping set is only using “canonical” prefixes.)
5. Filtering mapping records
SSSOM-CLI allows to use the SSSOM/Transform language to precisely compose the output set by excluding and/or including specific mapping records.
5.1. Keeping only records with a subject in a given namespace
Say we only want to keep mapping records where the object side is a UBERON entity. This could be done with:
$ sssom-cli -p my-set.sssom.tsv \ --include "object==UBERON:*" \ --output uberon.sssom.tsv
This is exactly equivalent to
$ sssom-cli -p my-set.sssom.tsv \ --rule "object==UBERON:* -> include()" \ --output uberon.sssom.tsv
That is, --include "EXPRESSION"
is a syntactic shortcut for --rule "EXPRESSION -> include()"
.
The -p
option (also called --prefix-map-from-input
) allows to automatically use the prefix names declared in the set’s own prefix map. Here, this is assuming that the set declares a UBERON:
prefix name.
If you are not sure of the prefix name used within the set (for example, maybe the set is using uberon:
instead), you might prefer to include your own declaration, just to be sure:
$ sssom-cli my-set.sssom.tsv \ --prefix=UBERON=http://purl.obolibrary.org/obo/UBERON_ \ --include "object==UBERON:*" \ --output uberon.sssom.tsv
(In the remaining examples, for brevity’s sake we will assume that all the required prefix names are declared exactly as we expect in the input set, so that we can rely solely on the -p
option.)
Instead of including all mapping records with a UBERON object, you can also excluding all records with a non-UBERON object:
$ sssom-cli -p my-set.sssom.tsv \ --exclude "!object==UBERON:*" \ --include-all \ --output uberon.sssom.tsv
The --exclude "!object==UBERON:*"
rule causes all records with a non-UBERON object to be dropped, then the --include-all
rule causes all the remaining records (which, at this point, can only be records with a UBERON object) to be included in the final set (this option is absolutely necessary here because the SSSOM/Transform ruleset must always include at least one “include” rule, otherwise the final set would be completely empty).
This is equivalent to
$ sssom-cli -p my-set.sssom.tsv \ --rule "!object==UBERON:* -> stop()" \ --rule "predicate==* -> include()" \ --output uberon.sssom.tsv
That is, --exclude "EXPRESSION"
is a syntactic shortcut for --rule "EXPRESSION -> stop()"
(which causes the records matching the expression to be dropped) and --include-all
is a syntactic shortcut for a rule that unconditionally includes all remaining records.
5.2. Keeping only records with either subject or object in a given namespace
Say that we want to keep all records where either the subject or the object is a UBERON entity. This could be done with:
$ sssom-cli -p my-set.sssom.tsv \ --include "subject==UBERON:*" \ --include "object==UBERON:*" \ --output uberon.sssom.tsv
or, using a single rule:
$ sssom-cli -p my-set.sssom.tsv \ --include "subject==UBERON:* || object==UBERON:*" \ --output uberon.sssom.tsv
Say that, in addition to only keeping records with a UBERON subject or object, we also want to force the “orientation” of those records so that the UBERON entities are always on the object side. Here is one way to do it:
$ sssom-cli -p my-set.sssom.tsv \ --rule "subject==UBERON:* -> invert()" \ --rule "object==UBERON:* -> include()" \ --output uberon.sssom.tsv
The first rule (subject==UBERON:* -> invert()
) takes all records with a UBERON subject and inverts them, so that they become records with a UBERON object. Then the second rule takes all records with a UBERON object (which means both the records that already had a UBERON object to begin with, and the records that we have just inverted) and includes them in the output set.
6. Editing mapping records
The SSSOM/Transform language used for filtering (as shown in the previous section) may also be used to modify the mapping records.
6.1. Adding a object_source
based on the object prefix
Say that we have a mapping set with records whose object is either a CL entity or a UBERON entity. We want to enrich those records so that they contain a object_source
slot pointing to either http://purl.obolibrary.org/obo/uberon.owl
or http://purl.obolibrary.org/obo/cl.owl
, respectively.
This can be done as follows:
$ sssom-cli -p my-set.sssom.tsv \ --prefix=obo=http://purl.obolibrary.org/obo/ \ --rule "object==UBERON:* -> assign('object_source', obo:uberon.owl)" \ --rule "object==CL:* -> assign('object_source', obo:cl.owl)" \ --include-all \ --output uberon-cl.sssom.tsv
Note that we explicitly declare a obo:
prefix name. This is for two reasons. First, it dispenses us from having to write the full IRI in the two rules. Second, it satisfies an idiosyncrasy of the SSSOM/TSV format, which is that semantic identifiers MUST always be written in their shortened, “CURIE” form – which implies that we must have a suitable prefix to shorten http://purl.obolibrary.org/obo/uberon.owl
into a CURIE form.
Note also the --include-all
rule, which is again needed to ensure the records end up in the output set – the two “assign” rules modify the records that they are applied to but do not select them for inclusion in the output set.