Help for Motif Search

[ GenomeNet Home Page | Motif Search Home Page | Motif Help Page ]

Search with a protein sequence against PROSITE profile library

OVERVIEW

PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family (if any) a new sequence belongs.

The generalized profiles developed by Bucher, P. and Bairoch A. is very sensitive method to find motifs in a query sequence. Profiles created from various protein domains are incorporated into the PROSITE library. Here a program called Findprofile written in ICR, Kyoto University has been used to search with a sequence against profile database. This program uses dynamic programming algorhism to find out the best alignment between a query sequence and each profile entry in PROSITE dabase. Only the profiles which have scores greater than given threshould are shown.

When a profile is found, wherever available, series of three dimensional structures are retrieved from PDB database, which share the motif and you can examine them using RasMol program the same as the result of pattern database search.

SCORE

As described in PROSITE documentation a (set of) threshould value(s) is defined for each PROSITE profile entry. ("CUT_OFF" and "NORMALIZATION" record). Here is an example of database records.


ID   ACP_DOMAIN; MATRIX.
AC   PS50075;
DT   NOV-1997 (CREATED); NOV-1997 (DATA UPDATE); JUL-1998 (INF UPDATE).
DE   Acyl carrier protein phosphopantetheine domain profile.
MA   /GENERAL_SPEC: ALPHABET='ABCDEFGHIKLMNPQRSTVWYZ'; LENGTH=71;
MA   /DISJOINT: DEFINITION=PROTECT; N1=6; N2=66;
MA   /NORMALIZATION: MODE=1; FUNCTION=LINEAR; R1=2.3; R2=.02281121; TEXT='NScore';
MA   /CUT_OFF: LEVEL=0; SCORE=271; N_SCORE=8.5; MODE=1;
MA   /CUT_OFF: LEVEL=-1; SCORE=184; N_SCORE=6.5; MODE=1;

In this example two "levels" of cut-off score are annotated.
By default (ie., no value was give in the box) program uses cut-off score of "LEVEL=0" taken from the database entry itself and recognizes it as a hit if the "raw score" calculated is greater than that threshould for the combination of the query sequence and an entry of PROSITE. If you give an integer value then that value (may be greater or less than 0), following adjustment will be applied;

Cut-off _src = Cut-off _db * (100 + Score_ad) / 100

where

Cut-off _src : Cut-off score to be used in database search
Cut-off _db : Cut-off score given in database entry
Score _ad : Parameter given in the box
(-100 < Score _ad)

Positive Score _ad increases the cut-off threshould and vice versa.

USER'S PROFILE LIBRARY

You can search your query sequnce against a profile library defined by a user which contains either single or multiple profile data in PROSITE format. Check the "User-defined Profile Library" box in the motif library list and provide a file name containing the profile.
The user defined profile library may be a subset of original PROSITE database or one generated from multiple sequence alignment data.

RESULTS

Common sequence patterns such as "C-kinase phosphorylation site" or " N-glycosylation site" (which are assigned as "/SKIP-FLAG=TRUE" on the CC line in the database) are ignored automatically.

If any profiles are found in the database ProfileFinder then looks up the Protein Data Bank entries which share the same motif annotated in PROSITE dabase on 3D lines.
Under Related Structures column you can find the number of those structures. A list of those entries, ID numbers of PDB together with brief description of the proteins is shown by clicking the cell. From the list you can either jump into DBGET to look at the PDB entry more precisely, or see the position and the structre of the profile motif on the 3D structure.

Under Position (Score) column of the table the position (start and end sequence numbers) and the raw score values of found motifs are listed. Click Detail bottun to see actual positions of the motif along the query sequence . A Consensus sequence is taken from each database entry. Aligned with this consensus sequence a fragment of query sequence (upper one) is shown with the score. Actual position on the query sequence is also shown. (red residue letters)

OUTPUT

Below a sample output table of the search is shown. Click the images to get detailed results discribed above.

References

Falquet L., Pagni M., Bucher P., Hulo N., Sigrist C.J, Hofmann K. and Bairoch A.
"The PROSITE database, its status in 2002"
Nucl. Acids Res. 30(1):235-238, 2002.

PubMed: 11752303

Gribskov, M., McLachlan, A.D., Eisenberg,D.
Profile analysis: detection of distantly related proteins.
Proc. Natl. Acad. Sci. USA 84:4355-4358 (1987)

PubMed: 87260806

Gribskov M., Luethy, R., Eisenberg, D. Profile analyssis
Methods Enzymol. 183: 146-159 (1990)

PubMed: 90190364

Bucher P., Bairoch A.
A generalized profile syntax for biomolecular sequences motifs and its function in automatic sequence interpretation
In "ISMB-94; Proceedings 2nd International Conference on Intelligent Systems for Molecular Biology."
(Altman R., Brutlag D., Karp P., Lathrop R., Searls D., Eds.), pp53-61, AAAIPress, Menlo Park, (1994).

PubMed: 96039003

Luethy R., Xenarios I., Bucher P.
Improving the sensitivity of the sequence profile method.
Prot. Sci. 3:139- 146 (1994)

PubMed: 7511453

Bucher P., Karplus K., Moeri N., Hofmann, K. .
A flexible motif search technique based on generalized profiles.
Comput. Chem. 20:3-24 (1996)

PubMed: 8867839