PROSEARCH
PROEARCH is used to search for protein motifs in a query sequence. It compares the query sequence against all patterns stored in the PROSITE database of protein families and domains. Motifs in Prosite are encoded as regular expressions called patterns.
The process used to derive these patterns involves multiple alignment of known homologues and expert human identification of conserved regions. The conserved regions are reduced to single consensus expressions, using one letter amino acid codes.
For example:
- Actin pattern: [FY]-[LIV]-G-[DE]-E-A-Q-x-[RKQ](2)-G
- Nuclear receptor: C-x(2)-C-x-[DE]-x(5)-[HN]-[FY]-x(4)-C-x(2)-C-x(2)-F-F-x-R
- CAMP phosporylation site: [RK](2)-x-[ST].
The 'x' stands for any of the 20 amino acids. When amino acids appear in square brackets, any of their contents can be matched. Numbers in parens indicate how many of the preceding amino acid (whether specified or indicated by x) occur in the pattern. PPSearch identifies regions of the query sequnce are also found as patterns in the PROSITE database. A query sequence may match 0 -> many patterns, since many protein families have more than one conserved region (i.e. have more than one characteristic motif). The matches found in our query sequence help us to determine to which family our protein sequence belongs and which domains are present in our protein.
If you use Prosearch, please cite: Kolakowski LF Jr, Leunissen JA, Smith JE.(1992) "ProSearch: fast searching of protein sequences with regular expression patterns related to protein structure and function". Biotechniques 13(6):919-21.
INPUT = Protein Sequences.
TEST FILES
Input file: prosearch_in.txt
Output file: prosearch_out.txt