Class UnknownSeq
source code
object --+
|
Seq --+
|
UnknownSeq
A read-only sequence object of known length but unknown contents.
If you have an unknown sequence, you can represent this with a normal
Seq object, for example:
>>> my_seq = Seq("N"*5)
>>> my_seq
Seq('NNNNN', Alphabet())
>>> len(my_seq)
5
>>> print my_seq
NNNNN
However, this is rather wasteful of memory (especially for large
sequences), which is where this class is most usefull:
>>> unk_five = UnknownSeq(5)
>>> unk_five
UnknownSeq(5, alphabet = Alphabet(), character = '?')
>>> len(unk_five)
5
>>> print(unk_five)
?????
You can add unknown sequence together, provided their alphabets and
characters are compatible, and get another memory saving UnknownSeq:
>>> unk_four = UnknownSeq(4)
>>> unk_four
UnknownSeq(4, alphabet = Alphabet(), character = '?')
>>> unk_four + unk_five
UnknownSeq(9, alphabet = Alphabet(), character = '?')
If the alphabet or characters don't match up, the addition gives an
ordinary Seq object:
>>> unk_nnnn = UnknownSeq(4, character = "N")
>>> unk_nnnn
UnknownSeq(4, alphabet = Alphabet(), character = 'N')
>>> unk_nnnn + unk_four
Seq('NNNN????', Alphabet())
Combining with a real Seq gives a new Seq object:
>>> known_seq = Seq("ACGT")
>>> unk_four + known_seq
Seq('????ACGT', Alphabet())
>>> known_seq + unk_four
Seq('ACGT????', Alphabet())
|
__init__(self,
length,
alphabet=Alphabet(),
character=None)
Create a new UnknownSeq object. |
source code
|
|
|
|
|
__str__(self)
Returns the unknown sequence as full string of the given length. |
source code
|
|
|
|
|
|
|
|
|
|
|
count(self,
sub,
start=0,
end=2147483647)
Non-overlapping count method, like that of a python string. |
source code
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
translate(self,
**kwargs)
Translate an unknown nucleotide sequence into an unknown protein. |
source code
|
|
|
ungap(self,
gap=None)
Return a copy of the sequence without the gap character(s). |
source code
|
|
Inherited from Seq :
__cmp__ ,
__contains__ ,
__hash__ ,
endswith ,
find ,
lstrip ,
rfind ,
rsplit ,
rstrip ,
split ,
startswith ,
strip ,
tomutable ,
tostring
Inherited from object :
__delattr__ ,
__format__ ,
__getattribute__ ,
__new__ ,
__reduce__ ,
__reduce_ex__ ,
__setattr__ ,
__sizeof__ ,
__subclasshook__
|
Inherited from Seq :
data
Inherited from object :
__class__
|
__init__(self,
length,
alphabet=Alphabet(),
character=None)
(Constructor)
| source code
|
Create a new UnknownSeq object.
If character is ommited, it is determed from the alphabet,
"N" for nucleotides, "X" for proteins, and
"?" otherwise.
- Overrides:
object.__init__
|
Returns the stated length of the unknown sequence.
- Overrides:
Seq.__len__
|
__str__(self)
(Informal representation operator)
| source code
|
Returns the unknown sequence as full string of the given length.
- Overrides:
object.__str__
|
Returns a (truncated) representation of the sequence for
debugging.
- Overrides:
object.__repr__
- (inherited documentation)
|
Add another sequence or string to this sequence.
Adding two UnknownSeq objects returns another UnknownSeq object
provided the character is the same and the alphabets are compatible.
>>> from Bio.Seq import UnknownSeq
>>> from Bio.Alphabet import generic_protein
>>> UnknownSeq(10, generic_protein) + UnknownSeq(5, generic_protein)
UnknownSeq(15, alphabet = ProteinAlphabet(), character = 'X')
If the characters differ, an UnknownSeq object cannot be used, so a
Seq object is returned:
>>> from Bio.Seq import UnknownSeq
>>> from Bio.Alphabet import generic_protein
>>> UnknownSeq(10, generic_protein) + UnknownSeq(5, generic_protein,
... character="x")
Seq('XXXXXXXXXXxxxxx', ProteinAlphabet())
If adding a string to an UnknownSeq, a new Seq is returned with the
same alphabet:
>>> from Bio.Seq import UnknownSeq
>>> from Bio.Alphabet import generic_protein
>>> UnknownSeq(5, generic_protein) + "LV"
Seq('XXXXXLV', ProteinAlphabet())
- Overrides:
Seq.__add__
|
__radd__(self,
other)
(Right-side addition operator)
| source code
|
Adding a sequence on the left.
If adding a string to a Seq, the alphabet is preserved:
>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import generic_protein
>>> "LV" + Seq("MELKI", generic_protein)
Seq('LVMELKI', ProteinAlphabet())
Adding two Seq (like) objects is handled via the __add__ method.
- Overrides:
Seq.__radd__
- (inherited documentation)
|
__getitem__(self,
index)
(Indexing operator)
| source code
|
Get a subsequence from the UnknownSeq object.
>>> unk = UnknownSeq(8, character="N")
>>> print unk[:]
NNNNNNNN
>>> print unk[5:3]
<BLANKLINE>
>>> print unk[1:-1]
NNNNNN
>>> print unk[1:-1:2]
NNN
- Overrides:
Seq.__getitem__
|
Non-overlapping count method, like that of a python string.
This behaves like the python string (and Seq object) method of the
same name, which does a non-overlapping count!
Returns an integer, the number of occurrences of substring argument
sub in the (sub)sequence given by [start:end]. Optional arguments start
and end are interpreted as in slice notation.
Arguments:
-
sub - a string or another Seq object to look for
-
start - optional integer, slice start
-
end - optional integer, slice end
>>> "NNNN".count("N")
4
>>> Seq("NNNN").count("N")
4
>>> UnknownSeq(4, character="N").count("N")
4
>>> UnknownSeq(4, character="N").count("A")
0
>>> UnknownSeq(4, character="N").count("AA")
0
HOWEVER, please note because that python strings and Seq objects (and
MutableSeq objects) do a non-overlapping search, this may not give the
answer you expect:
>>> UnknownSeq(4, character="N").count("NN")
2
>>> UnknownSeq(4, character="N").count("NNN")
1
- Overrides:
Seq.count
|
The complement of an unknown nucleotide equals itself.
>>> my_nuc = UnknownSeq(8)
>>> my_nuc
UnknownSeq(8, alphabet = Alphabet(), character = '?')
>>> print my_nuc
????????
>>> my_nuc.complement()
UnknownSeq(8, alphabet = Alphabet(), character = '?')
>>> print my_nuc.complement()
????????
- Overrides:
Seq.complement
|
The reverse complement of an unknown nucleotide equals itself.
>>> my_nuc = UnknownSeq(10)
>>> my_nuc
UnknownSeq(10, alphabet = Alphabet(), character = '?')
>>> print my_nuc
??????????
>>> my_nuc.reverse_complement()
UnknownSeq(10, alphabet = Alphabet(), character = '?')
>>> print my_nuc.reverse_complement()
??????????
- Overrides:
Seq.reverse_complement
|
Returns unknown RNA sequence from an unknown DNA sequence.
>>> my_dna = UnknownSeq(10, character="N")
>>> my_dna
UnknownSeq(10, alphabet = Alphabet(), character = 'N')
>>> print my_dna
NNNNNNNNNN
>>> my_rna = my_dna.transcribe()
>>> my_rna
UnknownSeq(10, alphabet = RNAAlphabet(), character = 'N')
>>> print my_rna
NNNNNNNNNN
- Overrides:
Seq.transcribe
|
Returns unknown DNA sequence from an unknown RNA sequence.
>>> my_rna = UnknownSeq(20, character="N")
>>> my_rna
UnknownSeq(20, alphabet = Alphabet(), character = 'N')
>>> print my_rna
NNNNNNNNNNNNNNNNNNNN
>>> my_dna = my_rna.back_transcribe()
>>> my_dna
UnknownSeq(20, alphabet = DNAAlphabet(), character = 'N')
>>> print my_dna
NNNNNNNNNNNNNNNNNNNN
- Overrides:
Seq.back_transcribe
|
Returns an upper case copy of the sequence.
>>> from Bio.Alphabet import generic_dna
>>> from Bio.Seq import UnknownSeq
>>> my_seq = UnknownSeq(20, generic_dna, character="n")
>>> my_seq
UnknownSeq(20, alphabet = DNAAlphabet(), character = 'n')
>>> print my_seq
nnnnnnnnnnnnnnnnnnnn
>>> my_seq.upper()
UnknownSeq(20, alphabet = DNAAlphabet(), character = 'N')
>>> print my_seq.upper()
NNNNNNNNNNNNNNNNNNNN
This will adjust the alphabet if required. See also the lower
method.
- Overrides:
Seq.upper
|
Returns a lower case copy of the sequence.
This will adjust the alphabet if required:
>>> from Bio.Alphabet import IUPAC
>>> from Bio.Seq import UnknownSeq
>>> my_seq = UnknownSeq(20, IUPAC.extended_protein)
>>> my_seq
UnknownSeq(20, alphabet = ExtendedIUPACProtein(), character = 'X')
>>> print my_seq
XXXXXXXXXXXXXXXXXXXX
>>> my_seq.lower()
UnknownSeq(20, alphabet = ProteinAlphabet(), character = 'x')
>>> print my_seq.lower()
xxxxxxxxxxxxxxxxxxxx
See also the upper method.
- Overrides:
Seq.lower
|
Translate an unknown nucleotide sequence into an unknown protein.
e.g.
>>> my_seq = UnknownSeq(11, character="N")
>>> print my_seq
NNNNNNNNNNN
>>> my_protein = my_seq.translate()
>>> my_protein
UnknownSeq(3, alphabet = ProteinAlphabet(), character = 'X')
>>> print my_protein
XXX
In comparison, using a normal Seq object:
>>> my_seq = Seq("NNNNNNNNNNN")
>>> print my_seq
NNNNNNNNNNN
>>> my_protein = my_seq.translate()
>>> my_protein
Seq('XXX', ExtendedIUPACProtein())
>>> print my_protein
XXX
- Overrides:
Seq.translate
|
Return a copy of the sequence without the gap character(s).
The gap character can be specified in two ways - either as an explicit
argument, or via the sequence's alphabet. For example:
>>> from Bio.Seq import UnknownSeq
>>> from Bio.Alphabet import Gapped, generic_dna
>>> my_dna = UnknownSeq(20, Gapped(generic_dna,"-"))
>>> my_dna
UnknownSeq(20, alphabet = Gapped(DNAAlphabet(), '-'), character = 'N')
>>> my_dna.ungap()
UnknownSeq(20, alphabet = DNAAlphabet(), character = 'N')
>>> my_dna.ungap("-")
UnknownSeq(20, alphabet = DNAAlphabet(), character = 'N')
If the UnknownSeq is using the gap character, then an empty Seq is
returned:
>>> my_gap = UnknownSeq(20, Gapped(generic_dna,"-"), character="-")
>>> my_gap
UnknownSeq(20, alphabet = Gapped(DNAAlphabet(), '-'), character = '-')
>>> my_gap.ungap()
Seq('', DNAAlphabet())
>>> my_gap.ungap("-")
Seq('', DNAAlphabet())
Notice that the returned sequence's alphabet is adjusted to remove any
explicit gap character declaration.
- Overrides:
Seq.ungap
|