Command-line tool
SSSOM-Java provides a command-line tool called sssom-cli
to manipulate mapping sets from the command line.
SSSOM-CLI acts as a filter that can read one or more mapping set(s), perform some treatments on the mappings, and then write the resulting mapping set out.
This page attempts to provide a comprehensive list of the features and options of the sssom-cli
tool. To get started, you might want to look at the installation instructions or the examples.
- 1. Input options
- 2. Output options
- 3. Checking a set against an ontology
- 4. Transformations
- 5. Extracting data from a mapping set
1. Input options
Input mapping sets are specified as positional arguments on the command line. Use as many arguments as needed to read more than one mapping sets. All mapping sets will be merged into a single set. If no positional argument is specified, SSSOM-CLI will attempt to read a set from its standard input.
The SSSOM/TSV, SSSOM/CSV, SSSOM/JSON, and RDF/Turtle formats are supported. SSSOM-CLI will attempt to automatically detect the format, based on the filename’s extension if possible, or by peaking at the first byte of input (if the extension is not recognized or when reading from standard input). Use the --input-format
option to explicitly specify the expected format instead of relying on automatic detection. Note that the option applies to all input files – it is not possible to explicitly specify a different format for each file.
To read both from the standard input and from one or several file(s), use the special value -
as a positional argument. For example, the following command reads a set from file1.sssom.tsv
and from the standard input:
sssom-cli file1.sssom.tsv -
The positional arguments are processed in the order in which they appear in the command line. In the example above, file1.sssom.tsv
is read first, followed by the standard input.
If an input file does not contain embedded metadata, SSSOM-CLI will automatically find and read the required external metadata file if it follows the naming convention recommended by the SSSOM specification (that is, if the TSV file is named set.sssom.tsv
, SSSOM-CLI will lookup for a metadata file named set.sssom.yml
). To use another metadata file, specify its name after the TSV file, separated by a colon, as in:
sssom-cli set.sssom.tsv:metadata.yaml
When a metadata file is so specified, the input file is automatically implied to be in the SSSOM/TSV or SSSOM/CSV format, since other formats do not allow the use of an external metadata file.
In previous versions of SSSOM-CLI, input files were specified using --input
options instead of positional arguments. Such options are still accepted for backwards compatibility.
1.1. Merging metadata
When merging several sets, by default multi-valued metadata slots are merged together. For example, if the first set has a creator_label
slot set to “Alice” and the second set has for the same slot the value “Bob”, the resulting set will contain both values. Use the --no-metadata-merge
option to disable merging and force the result set to contain only the metadata from the first input set.
1.2. Extended Prefix Map support
Use the --epm
option to pass an optional Extended Prefix Map (EPM) to SSSOM-CLI. How the extended prefix map will be used is determined by the value of the --epm-mode
option:
- PRE
- The extended prefix map is used before the input set is read, to complement the input set’s own prefix map. This allows to read a set even if its prefix map is incomplete, provided the EPM declares the prefixes that are missing from the set's prefix map. Note that declarations from the prefix map of the input set will always take precedence, declarations from the EPM will only be used for undeclared prefixes.
- POST
- The extended prefix map is used after the input set is read, to “reconcile” all IRIs from the set to make sure they use the “canonical” IRI prefixes set forth by the EPM for any namespace.
- BOTH
- This is a combination of the
PRE
andPOST
modes. The extended prefix map will be used both for complementing the input prefix map and for reconciliating the IRIs. This is the default mode.
1.3. Assumed default SSSOM version
By default, if a mapping set does not explicitly declare the version of the SSSOM specification it is compliant with (with the sssom_version
slot), it is assumed to be compliant with version 1.0 – this is the behaviour mandated by the specification.
Use the --assume-version
option to specify another version the input set(s) should be assumed to be compliant with. For example, with --assume-version=1.1
the set(s) will be assumed to be compliant with version 1.1 instead of 1.0, which will allow the recognition of any slot that has been introduced in that version – without that option, such slots would be ignored (or treated as extension slots, depending on the --accept-extra-metadata
option). Use the special value latest
to specify the highest version of the specification currently supported.
This option is intended to attempt to correctly process a set even if its authors had omitted to explicitly specify the sssom_version
slot. It has no effect if the input set does have a sssom_version
slot; it is only used in the absence of such a slot, it does not allow to override the value of that slot when present.
1.4. Input validation
By default, SSSOM-CLI will check whether the input sets are fully valid according to the SSSOM specification, and error out if they are not. This means for example that a set that does not have a mapping_set_id
slot or a license
slot will be rejected, because those slots are required by the SSSOM specification.
Use the --lax
option to force SSSOM-CLI to silently accept sets that would otherwise be rejected.
Note that even with the --lax
option, SSSOM-CLI will always check that all mappings in the input set(s) have the required subject_id
, predicate_id
, object_id
, and mapping_justification
slots. This minimal level of correctness checking cannot be disabled.
1.5. Slot propagation
By default, when reading a set, “propagatable slots” (as defined by the SSSOM specification) are automatically propagated down to each individual mappings. Use the --no-propagation
option to disable that behaviour.
1.6. Non-standard metadata.
The --accept-extra-metadata
option controls how non-standard metadata slots are handled, when they are found in the input set(s). There are three possible policies:
- NONE
- Non-standard metadata slots are completely ignored. This is the default policy.
- DEFINED
- Non-standard metadata slots that are defined as extension slots are accepted; other slots are ignored.
- UNDEFINED
- All non-standard metadata slots, whether defined as extensions or not, are accepted.
2. Output options
By default, SSSOM-CLI writes the resulting mapping set to the standard output. Use the --output
option to specify an output file instead.
Writing the result set to standard output can be completely disabled by using the --no-stdout
option. This is roughly equivalent to using --output /dev/null
in that the result set is not saved anywhere, but is more efficient as the entire process of writing the result step is skipped. This option has no effect if the result set is written to a file (as a result of --output FILE
) rather than the standard output.
Also by default, the resulting mapping set is written in “embedded mode”, with the metadata block in the same file as the TSV section. Use the --metadata-output
option to specify the name of a separate file where the metadata block should be written instead. If that option is used without the --output
option, SSSOM-CLI will write the TSV section to its standard output and the metadata block to the file specified by --metadata-output
.
2.1. Output prefix map
The --output-prefix-map
option allows to control which prefix map is used to shorten IRIs when writing the result set. That option accepts three values:
- INPUT
- The output prefix map is the same as the one used in the input set (when there are several input sets, their prefix maps are merged into one).
- SSSOMT
- The output prefix map is the one used for SSSOM/T processing (see the “Transformations” section below).
- BOTH
- The output prefix map is the combination of the prefix map from the input set(s) and the prefix map used for SSSOM/T, with the latter taking precedence over the former. This is the default behaviour.
2.2. Metadata of the output set
By default, the output set will contain the same set-level metadata as the first input set, except for multi-valued slots which will contain values coming from all the input sets (unless the --no-metadata-merge
option is used, see above).
Use the --output-metadata
option to read the metadata to use for the output set from a specific metadata file. Single-valued slots from the first input set will then no longer be carried over to the result sets. Multi-valued slots from all sets will still be carried over, unless again the -no-metadata-merge
option is also used – in which case the output metadata will only come from the --output-metadata
option.
2.3. Splitting the result set
Instead of writing a single mapping set, it is possible to split the result set along the subject and object prefixes with the --split
option, which accepts the name of a directory where the split sets will be written.
2.4. Output format
SSSOM-CLI can write the output set in the SSSOM/TSV format, the SSSOM/CSV format, the SSSOM/JSON format, or the RDF Turtle format.
By default, it uses the extension of the output filename, if specified, to automatically determine the output format. Without an output filename (when writing to standard output), or if the extension is not recognised, the default output format is SSSOM/TSV.
The output format can always be explicitly specified with the --output-format
option, which accepts the following values:
- tsv
- Write output in SSSOM/TSV format (this is already the default).
- csv
- Write output in SSSOM/CSV format.
- json
- Write output in SSSOM/JSON format.
- ttl
- Write output in RDF Turtle format.
Note that both the JSON and RDF formats are currently poorly specified, so the output produced by those options may change in the future. The SSSOM/CSV format is not officially part of the SSSOM specification at all. It is supported here merely as a convenience, but its use is best avoided.
In both JSON and RDF modes, the --metadata-output
option above is ignored, since those formats do not allow the use of an external metadata file.
Two more options control the behaviour of the JSON mode:
- --json-short-iris
- Use this option to write all identifiers in short form (CURIE form). The default is to only write full-length identifiers.
- --json-write-ld-context
- Use this option to write the CURIE map in a JSON-LD-like
@context
key, rather than in acurie_map
key. This is for compatibility with SSSOM-Py.
For convenience, another option, --json-sssompy
, can be used to trigger JSON output tailored for SSSOM-Py compatibility. Using that option is equivalent to using --output-format json
, --json-short-iris
, and --json-write-ld-context
combined.
2.5. Slot condensation
By default, when writing the result set, “propagatable slots” are condensed up to the mapping set whenever possible. That is, if all mappings in the set have the same value for a propagatable slot, then the value is written only once in the set metadata, rather than for each mapping. Use the --no-condensation
option to disable that behaviour.
2.6. Non-standard metadata
The --write-extra-metadata
option controls how non-standard metadata slots are written in the result set. This option is only meaningful if the --accept-extra-metadata
option is not set to NONE
, because otherwise the set cannot contain any non-standard slot to begin with.
The option accepts the same values as --accept-extra-metadata
, with the following meanings:
- NONE
- Non-standard slots are not written to the result set at all.
- DEFINED
- Non-standard slots are written as defined extension slots.
- UNDEFINED
- Non-standard slots are written as undefined extensions.
The default policy is DEFINED
, except when writing in RDF/Turtle, where the policy is UNDEFINED
(the rationale for that is that extension definitions are presumably not useful in RDF output: the main interest of an extension definition is to provide the property that gives meaning to the extension, but in RDF extension slots are always represented by their property anyway).
2.7. Other options
By default, the mapping_cardinality
slot is excluded from the result set. This is because that slot is considered (at least by the author of this program) as not very useful, since its value can always be computed on the fly whenever needed. In fact, it should always be computed on the fly whenever needed, because pre-computed values found in a set may not be reliable (if the composition of the set has changed since the values were computed). Use the --cardinality keep
option to include the mapping_cardinality
slot in the result set, or --cardinality force
to not only include the slot, but also compute the effective cardinality for each mapping (computed at the last moment before writing the set – in particular, after any transformation of the set).
The result set is always sorted, so that the mappings are written in a completely deterministic order. Use the --no-sorting
option to disable sorting and write the mappings in the original order in which they were read. This can speed up the processing, as sorting can be a time-consuming operation on very large sets.
3. Checking a set against an ontology
The --update-from-ontology
option allows checking and updating the mapping set against an OWL ontology. It expects the filename of an ontology in any format supported by the OWL API.
The filename may be followed by a semi-colon and a list of comma-separated flags (:flag1,flag2,...
) which will control the exact behaviour of the option.
Available flags are:
- label
- If the subject (respectively the object) of a mapping exists in the ontology, the mapping’s
subject_label
(resp.object_label
) will be updated to match therdfs:label
of the corresponding entity in the ontology. - source
- If the subject (respectively the object) of a mapping exists in the ontology, the mapping’s
subject_source
(resp.object_source
) will be set to the ontology’s IRI. - existence
- If the subject or the object of a mapping does not exist in the ontology or is deprecated, the mapping is removed from the set.
- subject
- Only consider the subject side of mappings when updating the labels, the sources, and/or checking for existence.
- object
- Only consider the object side of mappings when updating the labels, the sources, and/or checking for existence.
If no flags are specified, the default flags are label,source
. If only a subject
or object
flag is specified, it is added to the default flags (so, :subject
is equivalent to :subject,label,source
). Any other flag resets the default flags; so to check for existence in addition to updating the labels and the sources, all corresponding flags must be explicitly specified (:existence,label,source
).
The --update-from-ontology
option may be specified several times to check a mapping set against several ontologies consecutively.
If the ontology uses imports, SSSOM-CLI will try to resolve them using a default catalog file named catalog-v001.xml
, if such a file exists in the current directory. Use the --catalog
option to explicitly specify another catalog file (that option accepts a special value none
to disable using the default catalog-v001.xml
file).
If an imported ontology cannot be found (with or without the help of a catalog), by default SSSOM-CLI will error out. To silently ignore a missing import, use the --ignore-missing-imports
option.
4. Transformations
The SSSOM/T-Mapping dialect of the SSSOM/Transform language can be used to apply arbitrary transformations to the mapping set before it is written to output.
The ruleset must contain at least one rule that uses the include()
function, otherwise the resulting set will be completely empty. Use the --include-all
option to automatically add a default rule at the end of the ruleset that includes any mapping that has not been dropped.
A full ruleset can be specified with the --ruleset
option. Single rules can also be specified on the command line with the --rule
option. If both a --ruleset
option and one or several --rule
option(s) are used, the rules defined by the --rule
options are added after the rules from the ruleset file.
For convenience, rules that are intended to exclude or include mappings can be specified with the --exclude
or --include
options, respectively. With these options, only the filter part of the rule needs to be specified. For example, --exclude subject==UBERON:*
is equivalent to --rule "subject==UBERON:* -> stop()"
, and --include subject==UBERON:*
is equivalent to --rule "subject==UBERON:* -> include()"
.
Also for convenience, rules that are intended to be applied to all mappings (using a catch-all filter such as predicate==*
can be specified with the --blanket-rule
option, which only needs the action part of the rule. For example, --blanket-rule invert()
is equivalent to --rule predicate==* -> invert()
, and would invert all mappings.
4.1. Prefix map for SSSOM/T rules
All prefixes used in SSSOM/T rules must be declared. There are four different ways of declaring them. By order of precedence, they are:
prefix
declarations in the header of the SSSOM/T ruleset file;- prefixes declared on the command line with the
--prefix
option (as in--prefix "PFX=http://example.org/prefix"
); - prefixes declared in the
curie_map
slot of the metadata file specified with the--prefix-map
option; - prefixes declared in the prefix map of the input set(s), if the
--prefix-map-from-input
option is used.
Regardless of where a prefix declaration comes from, once it is declared, a prefix can be used in any SSSOM/T rule. For example, a rule in the SSSOM/T file can use a prefix declared with a --prefix
declaration on the command line or (if the --prefix-map-from-input
option is used) a prefix declared in the prefix map from the input set. Conversely, a rule defined on the command line (with a --rule
option) can use a prefix declared in the header of the SSSOM/T file.
4.2. Examples
The following example shows how to filter out any mapping that does not have a subject in the http://purl.obolibrary.org/obo/UBERON_
namespace:
prefix UBERON: <http://purl.obolibrary.org/obo/UBERON_> subject==UBERON:* -> include();
Here is a slightly more complex example (prefix declarations have been omitted for the sake of brevity):
# Ensure CL and UBERON are on the object side subject==CL:* || subject==UBERON:* -> invert(); # Filter out any mapping to something else than CL or UBERON !(object==CL:* || object==UBERON:*) -> stop(); # Forcibly set the object source to CL or UBERON object==CL:* -> assign("object_source", "http://purl.obolibrary.org/obo/cl.owl"); object==UBERON:* -> assign("object_source", "http://purl.obolibrary.org/obo/uberon.owl"); # Include all remaining mappings subject==* -> include();
5. Extracting data from a mapping set
SSSOM-CLI can be used to extract individual piece of data from a mapping set. Of note, this is an experimental feature, that may change abruptly in the future.
To do so, use the --extract
option, followed by an expression indicating which piece of data you want to retrieve. If the mapping set does contain a value matching the expression, the value will be printed to standard output.
The --extract
option will typically be used jointly with the --no-stdout
option, so that the normal output of SSSOM-CLI is not also sent to the standard output.
5.1. Extractor expression syntax.
The expression expected by the --extract
option is made of two components, separated by a dot.
The first component indicates from which object to retrieve a value, and must be either:
- set
- To retrieve a value from a metadata of the entire set.
- mapping(N)
- To retrieve a value from the Nth mapping of the set. If N is negative, the Nth mapping counting from the last mapping of the set will be used.
- mapping
- Equivalent to
mapping(1)
, to retrieve a value from the first mapping of the set.
The second component indicates which data to retrieve from the selected object, and must follow one of the following forms:
- slot(NAME)
- To retrieve the value of the metadata slot NAME. If NAME is the name of the multi-valued slot (e.g.
creator_id
), the first value will be retrieved. - slot(NAME, N)
- To retrieve the Nth value from the metadata slot NAME. This is only meaningful for a multi-valued slot. If N is negative, the Nth value counting from the last will be retrieved.
- extension(PROPERTY)
- To retrieve the value of the extension slot associated to the indicated PROPERTY, according to the set’s extension definitions.
- special(sexpr)
- To retrieve the canonical S-expression representing a mapping. This is only available if the selected object is a mapping.
- special(hash)
- To retrieve the standard SSSOM hash of a mapping. This is only available if the selected object is a mapping.
5.2. Examples of extractor expressions
- set.slot(license)
- Retrieves the license of the mapping set.
- set.slot(creator_id, 2)
- Retrieves the ID of the second creator of the set.
- set.slot(see_also)
- Retrievse the first
see_also
link of the set (equivalent toset.slot(see_also, 1)
). - set.extension(https://example.org/property)
- Retrieves the value of set extension slot associated with the
https://example.org/property
property. - mapping(2).slot(subject_id)
- Retrieves the subject of the second mapping.
- mapping.slot(author_id, 2)
- Retrieves the second author of the first mapping.
- mapping(-1).extension(https://example.org/property)
- Retrieves the value of the extension slot associated with the
https://example.org/property
property, for the last mapping of the set. - mapping(2).special(hash)
- Retrieves the hash of the second mapping.