Package Bio :: Package SeqIO :: Module _index :: Class _IndexedSeqFileDict
[hide private]
[frames] | no frames]

Class _IndexedSeqFileDict

source code

UserDict.DictMixin --+
                     |
                    _IndexedSeqFileDict
Known Subclasses:

Read only dictionary interface to a sequential sequence file.

Keeps the keys and associated file offsets in memory, reads the file to access entries as SeqRecord objects using Bio.SeqIO for parsing them. This approach is memory limited, but will work even with millions of sequences.

Note - as with the Bio.SeqIO.to_dict() function, duplicate keys (record identifiers by default) are not allowed. If this happens, a ValueError exception is raised.

By default the SeqRecord's id string is used as the dictionary key. This can be changed by suppling an optional key_function, a callback function which will be given the record id and must return the desired key. For example, this allows you to parse NCBI style FASTA identifiers, and extract the GI number to use as the dictionary key.

Note that this dictionary is essentially read only. You cannot add or change values, pop values, nor clear the dictionary.

Instance Methods [hide private]
 
__init__(self, filename, format, alphabet, key_function) source code
 
__repr__(self) source code
 
__str__(self) source code
 
__contains__(self, key) source code
 
__len__(self)
How many records are there?
source code
 
itervalues(self)
Iterate over the SeqRecord) items.
source code
 
iteritems(self)
Iterate over the (key, SeqRecord) items.
source code
 
iterkeys(self)
Iterate over the keys.
source code
 
items(self)
Would be a list of the (key, SeqRecord) tuples, but not implemented.
source code
 
values(self)
Would be a list of the SeqRecord objects, but not implemented.
source code
 
keys(self)
Return a list of all the keys (SeqRecord identifiers).
source code
 
__iter__(self)
Iterate over the keys.
source code
 
__getitem__(x, y)
x[y]
source code
D[k] if k in D, else d
get(D, k, d=...)
d defaults to None.
source code
 
get_raw(self, key)
Similar to the get method, but returns the record as a raw string.
source code
 
__setitem__(self, key, value)
Would allow setting or replacing records, but not implemented.
source code
 
update(self, **kwargs)
Would allow adding more values, but not implemented.
source code
 
pop(self, key, default=None)
Would remove specified record, but not implemented.
source code
 
popitem(self)
Would remove and return a SeqRecord, but not implemented.
source code
 
clear(self)
Would clear dictionary, but not implemented.
source code
 
fromkeys(self, keys, value=None)
A dictionary method which we don't implement.
source code
 
copy(self)
A dictionary method which we don't implement.
source code

Inherited from UserDict.DictMixin: __cmp__, has_key, setdefault

Method Details [hide private]

__repr__(self)
(Representation operator)

source code 
Overrides: UserDict.DictMixin.__repr__

__contains__(self, key)
(In operator)

source code 
Overrides: UserDict.DictMixin.__contains__

__len__(self)
(Length operator)

source code 

How many records are there?

Overrides: UserDict.DictMixin.__len__

itervalues(self)

source code 

Iterate over the SeqRecord) items.

Overrides: UserDict.DictMixin.itervalues

iteritems(self)

source code 

Iterate over the (key, SeqRecord) items.

Overrides: UserDict.DictMixin.iteritems

iterkeys(self)

source code 

Iterate over the keys.

Overrides: UserDict.DictMixin.iterkeys

items(self)

source code 

Would be a list of the (key, SeqRecord) tuples, but not implemented.

In general you can be indexing very very large files, with millions of sequences. Loading all these into memory at once as SeqRecord objects would (probably) use up all the RAM. Therefore we simply don't support this dictionary method.

Overrides: UserDict.DictMixin.items

values(self)

source code 

Would be a list of the SeqRecord objects, but not implemented.

In general you can be indexing very very large files, with millions of sequences. Loading all these into memory at once as SeqRecord objects would (probably) use up all the RAM. Therefore we simply don't support this dictionary method.

Overrides: UserDict.DictMixin.values

__iter__(self)

source code 

Iterate over the keys.

Overrides: UserDict.DictMixin.__iter__

get(D, k, d=...)

source code 

d defaults to None.

Returns: D[k] if k in D, else d
Overrides: UserDict.DictMixin.get

get_raw(self, key)

source code 

Similar to the get method, but returns the record as a raw string.

If the key is not found, a KeyError exception is raised.

Note that on Python 3 a bytes string is returned, not a typical unicode string.

NOTE - This functionality is not supported for every file format.

update(self, **kwargs)

source code 

Would allow adding more values, but not implemented.

Overrides: UserDict.DictMixin.update

pop(self, key, default=None)

source code 

Would remove specified record, but not implemented.

Overrides: UserDict.DictMixin.pop

popitem(self)

source code 

Would remove and return a SeqRecord, but not implemented.

Overrides: UserDict.DictMixin.popitem

clear(self)

source code 

Would clear dictionary, but not implemented.

Overrides: UserDict.DictMixin.clear