[ GenomeNet Home Page | Motif Search Home Page | Motif Help Page ]

Generate a profile from multiple sequence alignment
and align the profile with a new sequence


OVERVIEW

Two type of profile data, either in PROSITE (i.e. Generalized profile) or Pfam (Profile HMM) format, are calculated from the multiple alignment sequences using Pfmake [1, 2, 3, 4] or Hmmbuild [5], respectively.

At the first step using a set of multiple sequence alignment data given by either copy & paste or upload the file a profile is calculated. When you choose HMMER program algorhithm you still have alternatives, i.e., only run hmmbuild or additional hmmcalibrate could be available. According its user manual the later step is optional, but doing it will increase the sensitivity of your database search. However hmmcalibrate step requires additional computer time, of course.

hmmbuild
hmmbuild reads a multiple sequence alignment file alignfile , builds a new profile HMM, and saves the HMM in hmmfile.
hmmcalibrate
hmmcalibrate reads an HMM file from hmmfile, scores a large number of synthesized random sequences with it, fits an extreme value distribution (EVD) to the histogram of those scores, and re-saves hmmfile now including the EVD parameters.
On the other hand when the PROSITE format is chosen as an output first, a set of weight is calculated for each sequence in multiple alignment using the program pfw [6] so that the sum of these weight should be 1. The number of shuffles persequence (Parameter N) is 1000 which makes the average relative precision apporx. 3 %. [6] Then profile matrix is calculated by the program pfmake using BLOSUM45 scoring matrix with maximum gap penalty multiplier, M=0.0.

At the end of the first step a resulted profile will be shown on the browser's screen and you have a chance to save (download) it on your local disk (by clicking the button). You can then go to the second step to align your (new) sequence against the generated profile. Or the obtained profile can be used to search against sequence databases, too. During the second step only single sequence is aligned.


OUTPUT

Here is a sample output of the search
  1. the generated profile data
  2. List of found motif
  3. Location of found motif

References

1. Bucher P., Karplus K., Moeri N., Hofmann, K.
A flexible motif search technique based on generalized profiles.
Comput. Chem. 20:3-24 (1996)
PubMed: 8867839
2. Gribskov. M., Luethy. R., Eisenberg D.
Profile analysis.
Meth. Enzymol. 183:146-159 (1990).
3. Luethy R., Xenarios I., Bucher P.
. Improving the sensitivity of the sequence profile method.
Prot. Sci. 3:139- 146 (1994)
PubMed: 7511453
4. Thompson J.D., Higgins D.G., Gibson T.J.
Improved sensitivity of profile searches through the use of sequence weights and gap excision.
CABIOS. 10:19-29 (1994)
PubMed: 8193951
5. "Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids."
Durbin R., Eddy S., Krogh A., Mitchison G., Cambridge University Press (1998) 350 pages.
5b. Eddy S.R.
Profile hidden Markov models
Bioinformatics 14:755-763 (1998)
PubMed: 9918945
6. Sibbald P.R., Argos P.
Weighting aligned protein or nucleic acid sequences to correct for unequal representation.
J. Mol. Biol. 216:813-818 (1990)
PubMed: 2176240