Extractseq

Extractseq is an EMBOSS tool that reads a sequence and writes sub-sequences from it to a file. The set of regions to extract is specified through the interface or in a file as pairs of start and end positions. The regions are written in the order in which they are specified. Thus, if the sequence AAAGGGTTT has been input and the regions: 7-9, 3-4 have been specified, then the output sequence will be: TTTAG. Optionally, each region may be written out as a separate sequence.

If you use EMBOSS, please cite: Rice, P, Longden, I, and Bleasby, A (2000) "EMBOSS: The European Molecular Biology Open Software Suit" Trends in Genetics 16, (6) 276--277.

Manual: http://emboss.sourceforge.net/apps/release/5.0/emboss/apps/extractseq.html

INPUT = DNA Sequences, Protein Sequences, and a file containing subsequence specifiers. We do not support direct sequence database access today, please let us know if this would be helpful to you.

Example Input: extractseqin.txt

INPUT Format: many EMBOSS USA sequence formats are accepted natively.

Optional Subsequence Region file syntax:

Example optional file: regionsfile.txt

OUTPUT = subsequences

OUTPUT Format: all subsequences appear as a single file that may be a single sequence, or multiple sequences.

Example Output: extractseqout.txt