[ GenomeNet Home Page |
Motif Search Home Page |
Motif Help Page ]
Generate a profile from multiple sequence alignment
and align the profile with a new sequence
OVERVIEW
Two type of profile data, either in PROSITE (i.e. Generalized profile) or
Pfam (Profile HMM) format, are
calculated from the multiple alignment sequences
using Pfmake [1, 2, 3, 4] or
Hmmbuild [5], respectively.
At the first step using a set of multiple sequence alignment data given by either copy & paste or
upload the file a profile is calculated. When you choose HMMER program algorhithm you still have
alternatives, i.e., only run hmmbuild or additional hmmcalibrate could be available.
According its user manual the later step is optional, but doing it will increase the sensitivity of your database search. However hmmcalibrate step requires additional computer time, of course.
- hmmbuild
- hmmbuild reads a multiple sequence alignment file alignfile , builds a new profile HMM, and saves the
HMM in hmmfile.
- hmmcalibrate
- hmmcalibrate reads an HMM file from hmmfile, scores a large number of synthesized random
sequences with it, fits an extreme value distribution (EVD) to the histogram of those scores, and
re-saves hmmfile now including the EVD parameters.
On the other hand when the PROSITE format is chosen as an output first, a set of weight is calculated
for each sequence in multiple alignment using the program pfw [6] so that the
sum of these weight should be 1. The number of shuffles persequence
(Parameter N) is 1000 which makes the average relative precision
apporx. 3 %. [6] Then profile matrix is calculated by the program
pfmake
using BLOSUM45 scoring matrix with maximum gap penalty multiplier, M=0.0.
At the end of the first step a resulted profile will be shown on the browser's screen and
you have a chance to save (download) it on your local disk (by clicking the button). You can
then go to the second step to align your (new) sequence against the generated profile. Or the
obtained profile can be used to search against sequence databases, too. During the second step
only single sequence is aligned.
OUTPUT
Here is a sample output of the search
- the generated profile data
- List of found motif
- Location of found motif
References
-
1. Bucher P., Karplus K., Moeri N., Hofmann, K.
A flexible motif search technique based on generalized profiles.
Comput. Chem. 20:3-24 (1996)
-
2. Gribskov. M., Luethy. R., Eisenberg D.
Profile analysis.
Meth. Enzymol. 183:146-159 (1990).
-
3. Luethy R., Xenarios I., Bucher P.
. Improving the sensitivity of the sequence profile method.
Prot. Sci. 3:139- 146 (1994)
-
4. Thompson J.D., Higgins D.G., Gibson T.J.
Improved sensitivity of profile searches through the use of sequence
weights and gap excision.
CABIOS. 10:19-29 (1994)
-
5. "Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids."
Durbin R., Eddy S., Krogh A., Mitchison G., Cambridge University Press (1998) 350 pages.
-
5b.
Eddy S.R.
Profile hidden Markov models
Bioinformatics 14:755-763 (1998)
-
6. Sibbald P.R., Argos P.
Weighting aligned protein or nucleic acid sequences to correct for unequal representation.
J. Mol. Biol. 216:813-818 (1990)