Free Software Stuff

SeqVault - Biological sequence data repository

This is the home page of SeqVault, a command line tool to manage a repository of biological sequence data. It is mainly a thin wrapper around the BioSQL database schema and some of the Biopython modules.

SeqVault is distributed under the terms of the GNU General Public License, version 3 or higher.

Setup

You need a PostgreSQL server (not necessarily on your machine, but it must be reachable from your machine). Create a dedicated PostgreSQL user account and a dedicated database for SeqVault, then initialize the database by loading the provided biosql/biosqldb-pg.sql script:

$ psql -h hostname -U username database < biosql/biosqldb-pg.sql

Create a file named $XDG_CONFIG_HOME/seqvault/seqvault.rc with the following contents:

[Server]
host: hostname
user: username
password: password
database: database

Install the following Python modules: Psycopg2, Biopython, and IPython (this last one is optional).

SeqVault may now be used directly from the source directory, or it may be installed as a Python package using the provided setup.py script.

Usage

Calling the svc program alone will list the available subcommands. Calling svc help command will display the help for the specified command.

Here are the basics steps in using SeqVault:

$ svc db new vectors "Vectors database" VEC

This will create a new subdatabase called vectors; sequences stored in this database will have an accession number starting with the “VEC” prefix (so VEC_000000, VEC_000001, and so on).

$ svc add vectors sequence.gb

This will add the sequence(s) contained in the sequence.gb to the previously created “vectors” database. The command will also print the sequence(s) as stored on the standard output.

Note that only GenBank-formatted files are accepted.

$ svc list vectors

This will list the sequences stored in the “vectors” subdatabase.

$ svc get VEC_000003

This retrieves the sequence(s) identified by the specified accession number(s). Sequences are printed on standard output.

$ svc edit VEC_000003

This will fire a text editor (currently gVim—this is not modifiable at runtime right now) loaded with the text of the specified sequence. After the editor is closed and if the sequence has been modified, the modified sequence will be stored back in the database.

Download

The source code is available in a Git repository; clone it with the following command:

$ git clone https://git.incenp.org/damien/seqvault.git

No tarball distribution has been released yet, but you may download a snapshot of the “develop” branch.