Command-line tool

SSSOM-Java provides a command-line tool called sssom-cli to manipulate mapping sets from the command line.

SSSOM-CLI acts as a filter that can read one or more mapping set(s), perform some treatments on the mappings, and then write the resulting mapping set out.

This page attempts to provide a comprehensive list of the features and options of the sssom-cli tool. To get started, you might want to look at the installation instructions or the examples.

1. Input options

Input mapping sets are specified as positional arguments on the command line. Use as many arguments as needed to read more than one mapping sets. All mapping sets will be merged into a single set. If no positional argument is specified, SSSOM-CLI will attempt to read a set from its standard input.

The SSSOM/TSV, SSSOM/CSV, SSSOM/JSON, and RDF/Turtle formats are supported. SSSOM-CLI will attempt to automatically detect the format, based on the filename’s extension if possible, or by peaking at the first byte of input (if the extension is not recognized or when reading from standard input). Use the --input-format option to explicitly specify the expected format instead of relying on automatic detection. Note that the option applies to all input files – it is not possible to explicitly specify a different format for each file.

To read both from the standard input and from one or several file(s), use the special value - as a positional argument. For example, the following command reads a set from file1.sssom.tsv and from the standard input:

sssom-cli file1.sssom.tsv  -

The positional arguments are processed in the order in which they appear in the command line. In the example above, file1.sssom.tsv is read first, followed by the standard input.

If an input file does not contain embedded metadata, SSSOM-CLI will automatically find and read the required external metadata file if it follows the naming convention recommended by the SSSOM specification (that is, if the TSV file is named set.sssom.tsv, SSSOM-CLI will lookup for a metadata file named set.sssom.yml). To use another metadata file, specify its name after the TSV file, separated by a colon, as in:

sssom-cli set.sssom.tsv:metadata.yaml

When a metadata file is so specified, the input file is automatically implied to be in the SSSOM/TSV or SSSOM/CSV format, since other formats do not allow the use of an external metadata file.

In previous versions of SSSOM-CLI, input files were specified using --input options instead of positional arguments. Such options are still accepted for backwards compatibility.

1.1. Merging metadata

When merging several sets, by default multi-valued metadata slots are merged together. For example, if the first set has a creator_label slot set to “Alice” and the second set has for the same slot the value “Bob”, the resulting set will contain both values. Use the --no-metadata-merge option to disable merging and force the result set to contain only the metadata from the first input set.

1.2. Extended Prefix Map support

Use the --epm option to pass an optional Extended Prefix Map (EPM) to SSSOM-CLI. How the extended prefix map will be used is determined by the value of the --epm-mode option:

PRE
The extended prefix map is used before the input set is read, to complement the input set’s own prefix map. This allows to read a set even if its prefix map is incomplete, provided the EPM declares the prefixes that are missing from the set's prefix map. Note that declarations from the prefix map of the input set will always take precedence, declarations from the EPM will only be used for undeclared prefixes.
POST
The extended prefix map is used after the input set is read, to “reconcile” all IRIs from the set to make sure they use the “canonical” IRI prefixes set forth by the EPM for any namespace.
BOTH
This is a combination of the PRE and POST modes. The extended prefix map will be used both for complementing the input prefix map and for reconciliating the IRIs. This is the default mode.

1.3. Assumed default SSSOM version

By default, if a mapping set does not explicitly declare the version of the SSSOM specification it is compliant with (with the sssom_version slot), it is assumed to be compliant with version 1.0 – this is the behaviour mandated by the specification.

Use the --assume-version option to specify another version the input set(s) should be assumed to be compliant with. For example, with --assume-version=1.1 the set(s) will be assumed to be compliant with version 1.1 instead of 1.0, which will allow the recognition of any slot that has been introduced in that version – without that option, such slots would be ignored (or treated as extension slots, depending on the --accept-extra-metadata option). Use the special value latest to specify the highest version of the specification currently supported.

This option is intended to attempt to correctly process a set even if its authors had omitted to explicitly specify the sssom_version slot. It has no effect if the input set does have a sssom_version slot; it is only used in the absence of such a slot, it does not allow to override the value of that slot when present.

1.4. Input validation

By default, SSSOM-CLI will check whether the input sets are fully valid according to the SSSOM specification, and error out if they are not. This means for example that a set that does not have a mapping_set_id slot or a license slot will be rejected, because those slots are required by the SSSOM specification.

Use the --lax option to force SSSOM-CLI to silently accept sets that would otherwise be rejected.

Note that even with the --lax option, SSSOM-CLI will always check that all mappings in the input set(s) have the required subject_id, predicate_id, object_id, and mapping_justification slots. This minimal level of correctness checking cannot be disabled.

1.5. Slot propagation

By default, when reading a set, “propagatable slots” (as defined by the SSSOM specification) are automatically propagated down to each individual mappings. Use the --no-propagation option to disable that behaviour.

1.6. Non-standard metadata.

The --accept-extra-metadata option controls how non-standard metadata slots are handled, when they are found in the input set(s). There are three possible policies:

NONE
Non-standard metadata slots are completely ignored. This is the default policy.
DEFINED
Non-standard metadata slots that are defined as extension slots are accepted; other slots are ignored.
UNDEFINED
All non-standard metadata slots, whether defined as extensions or not, are accepted.

2. Output options

By default, SSSOM-CLI writes the resulting mapping set to the standard output. Use the --output option to specify an output file instead.

Writing the result set to standard output can be completely disabled by using the --no-stdout option. This is roughly equivalent to using --output /dev/null in that the result set is not saved anywhere, but is more efficient as the entire process of writing the result step is skipped. This option has no effect if the result set is written to a file (as a result of --output FILE) rather than the standard output.

Also by default, the resulting mapping set is written in “embedded mode”, with the metadata block in the same file as the TSV section. Use the --metadata-output option to specify the name of a separate file where the metadata block should be written instead. If that option is used without the --output option, SSSOM-CLI will write the TSV section to its standard output and the metadata block to the file specified by --metadata-output.

2.1. Output prefix map

The --output-prefix-map option allows to control which prefix map is used to shorten IRIs when writing the result set. That option accepts three values:

INPUT
The output prefix map is the same as the one used in the input set (when there are several input sets, their prefix maps are merged into one).
SSSOMT
The output prefix map is the one used for SSSOM/T processing (see the “Transformations” section below).
BOTH
The output prefix map is the combination of the prefix map from the input set(s) and the prefix map used for SSSOM/T, with the latter taking precedence over the former. This is the default behaviour.

2.2. Metadata of the output set

By default, the output set will contain the same set-level metadata as the first input set, except for multi-valued slots which will contain values coming from all the input sets (unless the --no-metadata-merge option is used, see above).

Use the --output-metadata option to read the metadata to use for the output set from a specific metadata file. Single-valued slots from the first input set will then no longer be carried over to the result sets. Multi-valued slots from all sets will still be carried over, unless again the -no-metadata-merge option is also used – in which case the output metadata will only come from the --output-metadata option.

2.3. Splitting the result set

Instead of writing a single mapping set, it is possible to split the result set along the subject and object prefixes with the --split option, which accepts the name of a directory where the split sets will be written.

2.4. Output format

SSSOM-CLI can write the output set in the SSSOM/TSV format, the SSSOM/CSV format, the SSSOM/JSON format, or the RDF Turtle format.

By default, it uses the extension of the output filename, if specified, to automatically determine the output format. Without an output filename (when writing to standard output), or if the extension is not recognised, the default output format is SSSOM/TSV.

The output format can always be explicitly specified with the --output-format option, which accepts the following values:

tsv
Write output in SSSOM/TSV format (this is already the default).
csv
Write output in SSSOM/CSV format.
json
Write output in SSSOM/JSON format.
ttl
Write output in RDF Turtle format.

Note that both the JSON and RDF formats are currently poorly specified, so the output produced by those options may change in the future. The SSSOM/CSV format is not officially part of the SSSOM specification at all. It is supported here merely as a convenience, but its use is best avoided.

In both JSON and RDF modes, the --metadata-output option above is ignored, since those formats do not allow the use of an external metadata file.

Two more options control the behaviour of the JSON mode:

--json-short-iris
Use this option to write all identifiers in short form (CURIE form). The default is to only write full-length identifiers.
--json-write-ld-context
Use this option to write the CURIE map in a JSON-LD-like @context key, rather than in a curie_map key. This is for compatibility with SSSOM-Py.

For convenience, another option, --json-sssompy, can be used to trigger JSON output tailored for SSSOM-Py compatibility. Using that option is equivalent to using --output-format json, --json-short-iris, and --json-write-ld-context combined.

2.5. Slot condensation

By default, when writing the result set, “propagatable slots” are condensed up to the mapping set whenever possible. That is, if all mappings in the set have the same value for a propagatable slot, then the value is written only once in the set metadata, rather than for each mapping. Use the --no-condensation option to disable that behaviour.

2.6. Non-standard metadata

The --write-extra-metadata option controls how non-standard metadata slots are written in the result set. This option is only meaningful if the --accept-extra-metadata option is not set to NONE, because otherwise the set cannot contain any non-standard slot to begin with.

The option accepts the same values as --accept-extra-metadata, with the following meanings:

NONE
Non-standard slots are not written to the result set at all.
DEFINED
Non-standard slots are written as defined extension slots.
UNDEFINED
Non-standard slots are written as undefined extensions.

The default policy is DEFINED, except when writing in RDF/Turtle, where the policy is UNDEFINED (the rationale for that is that extension definitions are presumably not useful in RDF output: the main interest of an extension definition is to provide the property that gives meaning to the extension, but in RDF extension slots are always represented by their property anyway).

2.7. Other options

By default, the mapping_cardinality slot is excluded from the result set. This is because that slot is considered (at least by the author of this program) as not very useful, since its value can always be computed on the fly whenever needed. In fact, it should always be computed on the fly whenever needed, because pre-computed values found in a set may not be reliable (if the composition of the set has changed since the values were computed). Use the --cardinality keep option to include the mapping_cardinality slot in the result set, or --cardinality force to not only include the slot, but also compute the effective cardinality for each mapping (computed at the last moment before writing the set – in particular, after any transformation of the set).

The result set is always sorted, so that the mappings are written in a completely deterministic order. Use the --no-sorting option to disable sorting and write the mappings in the original order in which they were read. This can speed up the processing, as sorting can be a time-consuming operation on very large sets.

3. Checking a set against an ontology

The --update-from-ontology option allows checking and updating the mapping set against an OWL ontology. It expects the filename of an ontology in any format supported by the OWL API.

The filename may be followed by a semi-colon and a list of comma-separated flags (:flag1,flag2,...) which will control the exact behaviour of the option.

Available flags are:

label
If the subject (respectively the object) of a mapping exists in the ontology, the mapping’s subject_label (resp. object_label) will be updated to match the rdfs:label of the corresponding entity in the ontology.
source
If the subject (respectively the object) of a mapping exists in the ontology, the mapping’s subject_source (resp. object_source) will be set to the ontology’s IRI.
existence
If the subject or the object of a mapping does not exist in the ontology or is deprecated, the mapping is removed from the set.
subject
Only consider the subject side of mappings when updating the labels, the sources, and/or checking for existence.
object
Only consider the object side of mappings when updating the labels, the sources, and/or checking for existence.

If no flags are specified, the default flags are label,source. If only a subject or object flag is specified, it is added to the default flags (so, :subject is equivalent to :subject,label,source). Any other flag resets the default flags; so to check for existence in addition to updating the labels and the sources, all corresponding flags must be explicitly specified (:existence,label,source).

The --update-from-ontology option may be specified several times to check a mapping set against several ontologies consecutively.

If the ontology uses imports, SSSOM-CLI will try to resolve them using a default catalog file named catalog-v001.xml, if such a file exists in the current directory. Use the --catalog option to explicitly specify another catalog file (that option accepts a special value none to disable using the default catalog-v001.xml file).

If an imported ontology cannot be found (with or without the help of a catalog), by default SSSOM-CLI will error out. To silently ignore a missing import, use the --ignore-missing-imports option.

4. Transformations

The SSSOM/T-Mapping dialect of the SSSOM/Transform language can be used to apply arbitrary transformations to the mapping set before it is written to output.

The ruleset must contain at least one rule that uses the include() function, otherwise the resulting set will be completely empty. Use the --include-all option to automatically add a default rule at the end of the ruleset that includes any mapping that has not been dropped.

A full ruleset can be specified with the --ruleset option. Single rules can also be specified on the command line with the --rule option. If both a --ruleset option and one or several --rule option(s) are used, the rules defined by the --rule options are added after the rules from the ruleset file.

For convenience, rules that are intended to exclude or include mappings can be specified with the --exclude or --include options, respectively. With these options, only the filter part of the rule needs to be specified. For example, --exclude subject==UBERON:* is equivalent to --rule "subject==UBERON:* -> stop()", and --include subject==UBERON:* is equivalent to --rule "subject==UBERON:* -> include()".

Also for convenience, rules that are intended to be applied to all mappings (using a catch-all filter such as predicate==* can be specified with the --blanket-rule option, which only needs the action part of the rule. For example, --blanket-rule invert() is equivalent to --rule predicate==* -> invert(), and would invert all mappings.

4.1. Prefix map for SSSOM/T rules

All prefixes used in SSSOM/T rules must be declared. There are four different ways of declaring them. By order of precedence, they are:

  • prefix declarations in the header of the SSSOM/T ruleset file;
  • prefixes declared on the command line with the --prefix option (as in --prefix "PFX=http://example.org/prefix");
  • prefixes declared in the curie_map slot of the metadata file specified with the --prefix-map option;
  • prefixes declared in the prefix map of the input set(s), if the --prefix-map-from-input option is used.

Regardless of where a prefix declaration comes from, once it is declared, a prefix can be used in any SSSOM/T rule. For example, a rule in the SSSOM/T file can use a prefix declared with a --prefix declaration on the command line or (if the --prefix-map-from-input option is used) a prefix declared in the prefix map from the input set. Conversely, a rule defined on the command line (with a --rule option) can use a prefix declared in the header of the SSSOM/T file.

4.2. Examples

The following example shows how to filter out any mapping that does not have a subject in the http://purl.obolibrary.org/obo/UBERON_ namespace:

prefix UBERON: <http://purl.obolibrary.org/obo/UBERON_>

subject==UBERON:* -> include();

Here is a slightly more complex example (prefix declarations have been omitted for the sake of brevity):

# Ensure CL and UBERON are on the object side
subject==CL:* || subject==UBERON:* -> invert();

# Filter out any mapping to something else than CL or UBERON
!(object==CL:* || object==UBERON:*) -> stop();

# Forcibly set the object source to CL or UBERON
object==CL:* -> assign("object_source", "http://purl.obolibrary.org/obo/cl.owl");
object==UBERON:* -> assign("object_source", "http://purl.obolibrary.org/obo/uberon.owl");

# Include all remaining mappings
subject==* -> include();

5. Extracting data from a mapping set

SSSOM-CLI can be used to extract individual piece of data from a mapping set. Of note, this is an experimental feature, that may change abruptly in the future.

To do so, use the --extract option, followed by an expression indicating which piece of data you want to retrieve. If the mapping set does contain a value matching the expression, the value will be printed to standard output.

The --extract option will typically be used jointly with the --no-stdout option, so that the normal output of SSSOM-CLI is not also sent to the standard output.

5.1. Extractor expression syntax.

The expression expected by the --extract option is made of two components, separated by a dot.

The first component indicates from which object to retrieve a value, and must be either:

set
To retrieve a value from a metadata of the entire set.
mapping(N)
To retrieve a value from the Nth mapping of the set. If N is negative, the Nth mapping counting from the last mapping of the set will be used.
mapping
Equivalent to mapping(1), to retrieve a value from the first mapping of the set.

The second component indicates which data to retrieve from the selected object, and must follow one of the following forms:

slot(NAME)
To retrieve the value of the metadata slot NAME. If NAME is the name of the multi-valued slot (e.g. creator_id), the first value will be retrieved.
slot(NAME, N)
To retrieve the Nth value from the metadata slot NAME. This is only meaningful for a multi-valued slot. If N is negative, the Nth value counting from the last will be retrieved.
extension(PROPERTY)
To retrieve the value of the extension slot associated to the indicated PROPERTY, according to the set’s extension definitions.
special(sexpr)
To retrieve the canonical S-expression representing a mapping. This is only available if the selected object is a mapping.
special(hash)
To retrieve the standard SSSOM hash of a mapping. This is only available if the selected object is a mapping.

5.2. Examples of extractor expressions

set.slot(license)
Retrieves the license of the mapping set.
set.slot(creator_id, 2)
Retrieves the ID of the second creator of the set.
set.slot(see_also)
Retrievse the first see_also link of the set (equivalent to set.slot(see_also, 1)).
set.extension(https://example.org/property)
Retrieves the value of set extension slot associated with the https://example.org/property property.
mapping(2).slot(subject_id)
Retrieves the subject of the second mapping.
mapping.slot(author_id, 2)
Retrieves the second author of the first mapping.
mapping(-1).extension(https://example.org/property)
Retrieves the value of the extension slot associated with the https://example.org/property property, for the last mapping of the set.
mapping(2).special(hash)
Retrieves the hash of the second mapping.