Tango
Usage
Collect, store, and retrieve records from NCBI with just the GI number. Uses NCBI's E-Utilities interface and MongoDB as a database for as a database for storing locally the most relevant information. Please check dependencies are locally installed before running.
The program will connect and download the file from NCBI corresponding to the GI number(s) provided and the following are extracted and stored in a MongoDB database: GI, accession, sequence, version, locus, organism, sequence length, gene, protein ID, translation.
This creates a local database that can be accessed downstream for many applications. Documents can be inserted, updated, read, and removed in order to help create the database you wish.
Options
-ids ID(s)
-file File with ID(s) [CSV or TXT]
-db Database (Nucleotide, protein, etc..) [optional]
-type gb, fasta, etc... [optional]
-force Force download? [optional]
-mongo MongoDB database name
-collection Collection name in MongoDB database
-insert Insert into database [optional/default]
-update Update database
-read Read from database
-remove Remove from database
-help Shows help message
Database Operations
Insert
To insert new data (documents) in the database, provide the GI number(s) with the optional -insert
flag. The following have the same function:
tango.pl -file gis.csv
tango.pl -file gis.csv -insert
tango.pl -id 74960989 4165050 -insert
Update
To update data (documents) stored in the database, provide the -update
flag followed by your query in format field:value
you want to update. You will be asked the field you wish to update in that document.
The following looks for the document with _id field
matching 34577062
.
tango.pl -update _id:34577062
It will then tell you which document you are about to update and ask which field you would like to update.
UPDATING _id record [34577062] in database...
Available fields are: _id accession sequence version locus organism seqLength gene proteinID translation
What field do you want? sequence
What is the NEW value for sequence field? NEWSEQUENCE
Document 34577062 updated, sequence field changed to NEWSEQUENCE.
Read
To read data (documents) stored in the database, provide the -read
flag followed by your query in format field:value
. You will be asked what field from the document you want to report back.
The following reads documents with _id fields
matching 34577062
and 74960989
.
tango.pl -read _id:34577062 _id:74960989
Remove
To remove data (documents) stored in the database, provide the -remove
flag followed by your query in format field:value
you want removed.
The following removes documents with _id fields
matching 34577062
and 74960989
.
tango.pl -remove _id:34577062 _id:74960989
Dependencies
You need to have the following installed:
-
Modules:
- Eutilities
- GenBank
- SeqFeatureI
License
See LICENSE