Xdna2 - DNA Strider file format utility

This is the homepage for Xdna2, a set of tools to convert biological sequence data from and to the binary file format used by the popular MacOS X program “DNA Strider”.

Xdna2 is distributed under the terms of the GNU General Public License.

This project is no longer under active development. It is replaced by a new project aiming to bring support for the DNA Strider format (and some other formats as well) to the SeqIO framework of Biopython.

Usage

Xdna2 provides two programs to convert DNA Strider “.xdna” files from the command line. The xdna2 program converts a DNA Strider file to a FASTA or GenBank file; the 2xdna does the opposite, it converts a FASTA file to a DNA Strider file.

The DNA Strider binary format

As far as I can tell from the “.xdna” files that I have seen, the DNA Strider binary format is organized in three or four sections:

a “header” containing informations such as the length and the type of the sequence;
the sequence itself;
a “comment”, a free-form text associated to the sequence.
optionally, an “annotations” section.

The header has a fixed size of 112 bytes. See the src/xdna.h file and the xdna_header_t C structure for details. Important informations include: the sequence type (DNA, RNA, protein—byte 2); the sequence topology (linear or circular—byte 3); the length of the sequence (bytes 29–32); and the length of the comment (bytes 97–100).

The sequence and the comment are plain text data. The sequence starts immediately after the header (byte 113); the comment starts immediately after the sequence (byte 113 + length of sequence).

Some .xdna files, such as those generated by Serial Cloner, contains an additional annotation section. This section starts by four fields describing the optional right and left overhangs (thanks to Cory Li, from the Benchling project, for details about the meaning of those fields). The next byte stores the count of annotation, and the annotations themselves are described in the remaining of the file.

Each annotation contains six fields; each field is a Pascal string beginning with one byte indicating the length of string, followed by the string itself. The fields are:

the annotation’s name;
a comment about the annotation;
the annotation’s type (CDS, promoter, rep_origin, and so on);
the start position;
the end position;
an unknown field, containing three dot-separated numbers (I strongly suspect this is a RGB triplet indicating the color to use for displaying the annotation).

In addition, there are four bytes between the fifth and sixth field, three of them being boolean flags indicating the DNA strand, whether the annotation should be displayed, and whether it should be displayed with an arrow.

The table below summarizes the structure of a DNA Strider file.

Header section
Contents	Type	Size in bytes	Remarks
version	integer	1	always zero?
sequence type	integer	1	1 = DNA, 2 = degenerate DNA, 3 = RNA, 4 = protein
topology	integer	1	0 = linear, 1 = circular
padding		25
sequence length	integer	4	big endian
padding		64
comment length	integer	4	big endian
padding		12
Sequence section
sequence	ASCII chars	sequence length
Comment section
comment	ASCII chars	comment length
Annotations section (optional)
unknown byte		1
ROH length	Pascal string	1–256	0 = no overhang, < 0 = 3’ overhang, > 0 = 5’ overhang
right overhang	ASCII chars	\|ROH length\|
LOH length	Pascal string	1–256	0 = no overhang, < 0 = 3’ overhang, > 0 = 5’ overhang
left overhang	ASCII chars	\|LOH length\|
annotations count	integer	1
Annotation subsection (one for each annotation)
feature name	Pascal string	1–256
feature description	Pascal string	1–256
feature type	Pascal string	1–256
start position	Pascal string	1–256
end position	Pascal string	1–256
strand flag	boolean	1	0 = reverse, 1 = forward
show flag	boolean	1	0 = don’t show the feature on map, 1 = show
unknown byte		1
arrow flag	boolean	1	0 = no arrow, 1 = arrow
extra field	Pascal string	1–256

Download

Download the last release tarball:

xdna2-0.2.4.tar.gz (application/gzip, 197K, signature)