Possible data format of multiple sequence alignment
To accept several type of data format of multiple sequence alignment a program readseq is activated as a filter before they are passed to the actual program pfmake to generate a profile.
(Pfmake program itself only accepts GCG MSF format described later.)
Here is a list of the formats which readseq currently understands:
- Caution !
- Clustal W format is NOT supported by
neither readseq or pfmake.
- MSF multi sequence format used by GCG software
- NBRF format
- Phylip, interleaved format for Phylip programs (v3.3, v3.4)
(Those are supported by our Clustal W server.)
- IG/Stanford, used by Intelligenetics and others
- GenBank, genbank flatfile format
- EMBL flatfile format
- Pearson/Fasta, a common format used by Fasta programs and others
- Phylip3.2, sequential format for Phylip programs
- PAUP's multiple sequence (NEXUS) format
- PIR/CODATA format used by PIR
- ASN.1 format used by NCBI
- DNAStrider, for common Mac program
- Fitch format, limited use
- GCG, single sequence format of GCG software
- Zuker format, limited use. Input only.
- Olsen, format printed by Olsen VMS sequence editor. Input only.
- Plain/Raw, sequence data only (no name, document, numbering)
Currently our hmmbuild accepts Stockholm or aligned FASTA alignments.
[ GenomeNet Home Page |
Motif Search Home Page ]