Help for Motif Search

Possible data format of multiple sequence alignment

To accept several type of data format of multiple sequence alignment a program readseq is activated as a filter before they are passed to the actual program pfmake to generate a profile.
(Pfmake program itself only accepts GCG MSF format described later.)

Here is a list of the formats which readseq currently understands:

Caution !: Clustal W format is NOT supported by neither readseq or pfmake.

MSF multi sequence format used by GCG software
NBRF format
Phylip, interleaved format for Phylip programs (v3.3, v3.4)
(Those are supported by our Clustal W server.)
IG/Stanford, used by Intelligenetics and others
GenBank, genbank flatfile format
EMBL flatfile format
Pearson/Fasta, a common format used by Fasta programs and others
Phylip3.2, sequential format for Phylip programs
PAUP's multiple sequence (NEXUS) format
PIR/CODATA format used by PIR
ASN.1 format used by NCBI
DNAStrider, for common Mac program
Fitch format, limited use
GCG, single sequence format of GCG software
Zuker format, limited use. Input only.
Olsen, format printed by Olsen VMS sequence editor. Input only.
Plain/Raw, sequence data only (no name, document, numbering)

Currently our hmmbuild accepts Stockholm or aligned FASTA alignments.

Stockholm format
(Sample of a simple Stockholm.)
aligned Fasta

[ GenomeNet Home Page | Motif Search Home Page ]