Trees | Indices | Help |
---|
|
Alphabets used in Seq objects etc to declare sequence type and letters.
This is used by sequences which contain a finite number of similar words.
|
|||
|
|
|||
Alphabet | |||
SingleLetterAlphabet | |||
ProteinAlphabet | |||
NucleotideAlphabet | |||
DNAAlphabet | |||
RNAAlphabet | |||
SecondaryStructure | |||
ThreeLetterProtein | |||
AlphabetEncoder | |||
Gapped | |||
HasStopCodon |
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|
Returns a common but often generic base alphabet object (PRIVATE). This throws away any AlphabetEncoder information, e.g. Gapped alphabets. Note that DNA+RNA -> Nucleotide, and Nucleotide+Protein-> generic single letter. These DO NOT raise an exception! |
Returns a common but often generic alphabet object (PRIVATE). >>> from Bio.Alphabet import IUPAC >>> _consensus_alphabet([IUPAC.extended_protein, IUPAC.protein]) ExtendedIUPACProtein() >>> _consensus_alphabet([generic_protein, IUPAC.protein]) ProteinAlphabet() Note that DNA+RNA -> Nucleotide, and Nucleotide+Protein-> generic single letter. These DO NOT raise an exception! >>> _consensus_alphabet([generic_dna, generic_nucleotide]) NucleotideAlphabet() >>> _consensus_alphabet([generic_dna, generic_rna]) NucleotideAlphabet() >>> _consensus_alphabet([generic_dna, generic_protein]) SingleLetterAlphabet() >>> _consensus_alphabet([single_letter_alphabet, generic_protein]) SingleLetterAlphabet() This is aware of Gapped and HasStopCodon and new letters added by other AlphabetEncoders. This WILL raise an exception if more than one gap character or stop symbol is present. >>> from Bio.Alphabet import IUPAC >>> _consensus_alphabet([Gapped(IUPAC.extended_protein), HasStopCodon(IUPAC.protein)]) HasStopCodon(Gapped(ExtendedIUPACProtein(), '-'), '*') >>> _consensus_alphabet([Gapped(IUPAC.protein, "-"), Gapped(IUPAC.protein, "=")]) Traceback (most recent call last): ... ValueError: More than one gap character present >>> _consensus_alphabet([HasStopCodon(IUPAC.protein, "*"), HasStopCodon(IUPAC.protein, "+")]) Traceback (most recent call last): ... ValueError: More than one stop symbol present |
Returns True except for DNA+RNA or Nucleotide+Protein (PRIVATE). >>> _check_type_compatible([generic_dna, generic_nucleotide]) True >>> _check_type_compatible([generic_dna, generic_rna]) False >>> _check_type_compatible([generic_dna, generic_protein]) False >>> _check_type_compatible([single_letter_alphabet, generic_protein]) True This relies on the Alphabet subclassing hierarchy. It does not check things like gap characters or stop symbols. |
Check all letters in sequence are in the alphabet (PRIVATE). >>> from Bio.Seq import Seq >>> from Bio.Alphabet import IUPAC >>> my_seq = Seq("MKQHKAMIVALIVICITAVVAALVTRKDLCEVHIRTGQTEVAVF", ... IUPAC.protein) >>> _verify_alphabet(my_seq) True This example has an X, which is not in the IUPAC protein alphabet (you should be using the IUPAC extended protein alphabet): >>> bad_seq = Seq("MKQHKAMIVALIVICITAVVAALVTRKDLCEVHIRTGQTEVAVFX", ... IUPAC.protein) >>> _verify_alphabet(bad_seq) False This replaces Bio.utils.verify_alphabet() since we are deprecating that. Potentially this could be added to the Alphabet object, and I would like it to be an option when creating a Seq object... but that might slow things down. |
Trees | Indices | Help |
---|
Generated by Epydoc 3.0.1 on Sat Aug 20 10:37:27 2011 | http://epydoc.sourceforge.net |