PROTDIST
PROTDIST computes a distance measure for protein sequences, using maximum likelihood estimates based on the Dayhoff PAM matrix, the JTT matrix model, the PBM model, Kimura's 1983 approximation to these, or a model based on the genetic code plus a constraint on changing to a different category of amino acid. The distances can also be corrected for gamma-distributed and gamma-plus-invariant-sites-distributed rates of change in different sites. Rates of evolution can vary among sites in a prespecified way, and also according to a Hidden Markov model. The program can also make a table of percentage similarity among sequences. The distances can then be used in the distance matrix programs. Part of Phylip.
© Copyright 1991-2006 by the University of Washington. Written by Joseph Felsenstein. Edited by NGBW team.
Manual: http://evolution.genetics.washington.edu/phylip/doc/protdist.html
INPUT: Aligned Protein Sequences (Character Matrix)
TEST DATA SET
(Note that although these may look like DNA sequences, they are being treated as protein sequences consisting entirely of alanine, cystine, glycine, and threonine).
5 13
Alpha AACGTGGCCACAT
Beta AAGGTCGCCACAC
Gamma CAGTTCGCCACAA
Delta GAGATTTCCGCCT
Epsilon GAGATCTCCGCCC
CONTENTS OF OUTPUT FILE (with all numerical options on )
(Note that when the numerical options are not on, the output file produced is in the correct format to be used as an input file in the distance matrix programs).
Jones-Taylor-Thornton model distance
Name Sequences
---- --------- Alpha AACGTGGCCA CAT Beta ..G..C.... ..C Gamma C.GT.C.... ..A Delta G.GA.TT..G .C. Epsilon G.GA.CT..G .CC
Alpha 0.000000 0.330447 0.625670 1.032032 1.354086
Beta 0.330447 0.000000 0.375578 1.096290 0.677616
Gamma 0.625670 0.375578 0.000000 0.975798 0.861634
Delta 1.032032 1.096290 0.975798 0.000000 0.226703
Epsilon 1.354086 0.677616 0.861634 0.226703 0.000000