Extractseq
Extractseq is an EMBOSS tool that reads a sequence and writes sub-sequences from it to a file. The set of regions to extract is specified through the interface or in a file as pairs of start and end positions. The regions are written in the order in which they are specified. Thus, if the sequence AAAGGGTTT has been input and the regions: 7-9, 3-4 have been specified, then the output sequence will be: TTTAG. Optionally, each region may be written out as a separate sequence.
If you use EMBOSS, please cite: Rice, P, Longden, I, and Bleasby, A (2000) "EMBOSS: The European Molecular Biology Open Software Suit" Trends in Genetics 16, (6) 276--277.
Manual: http://emboss.sourceforge.net/apps/release/5.0/emboss/apps/extractseq.html
INPUT = DNA Sequences, Protein Sequences, and a file containing subsequence specifiers. We do not support direct sequence database access today, please let us know if this would be helpful to you.
Example Input: extractseqin.txt
INPUT Format: many EMBOSS USA sequence formats are accepted natively.
Optional Subsequence Region file syntax:
- Comment lines start with '#' in the first column.
- Comment lines and blank lines are ignored.
- The line may start with white-space.
- There are two positive (integer) numbers per line separated by one or more space or TAB characters.
- The second number must be greater or equal to the first number.
- There can be optional text after the two numbers to annotate the line.
- White-space before or after the text is removed.
Example optional file: regionsfile.txt
OUTPUT = subsequences
OUTPUT Format: all subsequences appear as a single file that may be a single sequence, or multiple sequences.
Example Output: extractseqout.txt