ROBOT xref-extract command

The xref-extract command is intended to extract mappings from the cross-reference annotations found in an ontology.

1. Usage

In its simplest form, the command just needs (in addition to an input ontology, which may be specified with the -i option or be taken from any previous command in the ROBOT pipeline) the --mapping-file option, which indicates where the SSSOM mapping set derived from the annotations should be written:

robot xref-extract -i uberon.owl --mapping-file uberon-mappings.sssom.tsv

2. How extraction works

The default behaviour is to honour the oboInOwl:treat-xrefs-as- ontology annotations found in the ontology. Those annotations are used to:

  • decide which cross-references should be turned into mappings: only the cross-references that use a prefix declared in a oboInOwl:treat-xref-as-... annotation are considered, all other cross-references are ignored;
  • decide which predicate to use for the mappings.

The following table indicate which predicate is used depending on the annotation in which the prefix is declared:

treat-xrefs-as-equivalent
skos:exactMatch
treat-xrefs-as-is_a
skos:broadMatch
treat-xrefs-as-has-subclass
skos:narrowMatch
treat-xrefs-as-genus-differentia
semapv:crossSpeciesExactMatch
treat-xrefs-as-reverse-genus-differentia
semapv:crossSpeciesExactMatch

Several options allow to alter the default behaviour, they are described in the next subsections.

2.1. Custom prefix-to-predicate mappings

Use the --map-prefix-to-predicate option to force the use of a given predicate for cross-references with a specific prefix. For example:

robot xref-extract -i uberon.owl --mapping-file uberon.mappings.ssom.tsv \
  --map-prefix-to-predicate "FBbt http://w3id.org/semapv/vocab/crossSpeciesCloseMatch"

will turn any cross-reference to a FBbt term to a semapv:crossSpeciesCloseMatch mapping. This overrides any oboInOwl:treat-xrefs-as-... annotation found in the ontology.

To completely ignore the oboInOwl:treat-xrefs-as-... annotations, and only use explicitly specified prefix-to-predicate mappings, use the --ignore-treat-xrefs option.

2.2. Extracting cross-references using unmapped prefixes

By default, only cross-references with a prefix for which there is a prefix-to-predicate mapping are used (whether the prefix-to-predicate mapping comes from oboInOwl:treat-xref-as-... annotations or from a --map-prefix-to-predicate option). To generate mappings from all cross-references, use the --all-xrefs option. Cross-references with a prefix for which there is no associated predicate will be turned into mappings with a oboInOwl:hasDbXref predicate.

2.3. Extracting cross-references using unknown prefixes

Lastly, cross-references with a prefix that cannot be resolved are completely ignored, regardless of any other options. To force those cross-references to be turned into mappings, use the --permissive option. Be aware that using this option may result in an invalid SSSOM file if the ontology does contain cross-references with non-resolvable prefix names.

2.4. Extracting cross-references from obsolete classes

By default, classes that are marked as obsolete (owl:deprecated) are completely ignored. To extract cross-references from obsolete classes as well, use --include-obsoletes.

2.5. Pruning "duplicated" cross-references

When more than one cross-reference in the source ontology point to the same foreign term, the default behaviour is to generate a mapping from all such cross-references.

With the option --drop-duplicates, no mapping at all will be generated for cross-references to any foreign term if there are more than one cross-reference to that term. In addition, a warning will be emitted for each such foreign term, giving the list of all subject terms that have a cross-reference to it.

3. Re-using an existing mapping set

By default, the file indicated by the --mapping-file options is not required to exist – and if it does, it will be overwritten.

If any of the --append or --replace options is used, then the file indicated by --mapping-file is also an input file, which must exist and contain a valid SSSOM set, in addition to being an output file.

With --append, the mappings extracted from the ontology will be appended to the mappings already present in that file.

With --replace, the mappings already present in that file will be discarded and replaced by those extracted from the ontology, but the metadata of the original set will be preserved. This option allows to reuse the metadata of an existing set while ignoring the existing mappings themselves.

The existing mapping set may have its metadata in a separate YAML file (instead of being embedded in a commented header). If so, the command will try to automatically locate the metadata file. The --metadata option may also be used to explicitly specify the location of the metadata file. Note that even if the original set uses a separate metadata file, the set produced by the command will be in the “embedded” format.

The --metadata option may also be used without an existing mapping set (that is, when neither --append nor --replace are used), to reuse an existing metadata file that is not associated to a set.

4. Manually specifying some set metadata

Some of the metadata for the extracted mapping set can be explicitly specified on the command line.

Use the --set-id option to explicitly specify the mapping set ID (metadata slot mapping_set_id).

Use the --set-license option to explicitly specify the license of the mapping set (metadata slot license).

The values specified with those options override any value found in the existing metadata, when a pre-existing set is reused (see previous section).

Both the ID and the license should take the form of an URI, “ideally resolvable” as per the SSSOM specification. However xref-extract does not enforce that and will use whatever value is provided on the command line without checking that the value is a URI (even less that it is resolvable).