Trees | Indices | Help |
---|
|
Code to work with GenBank formatted files. Rather than using Bio.GenBank, you are now encouraged to use Bio.SeqIO with the "genbank" or "embl" format names to parse GenBank or EMBL files into SeqRecord and SeqFeature objects (see the Biopython tutorial for details). Also, rather than using Bio.GenBank to search or download files from the NCBI, you are now encouraged to use Bio.Entrez instead (again, see the Biopython tutorial for details). Currently the ONLY reason to use Bio.GenBank directly is for the RecordParser which turns a GenBank file into GenBank-specific Record objects. This is a much closer representation to the raw file contents that the SeqRecord alternative from the FeatureParser (used in Bio.SeqIO). Classes: Iterator Iterate through a file of GenBank entries ErrorFeatureParser Catch errors caused during parsing. FeatureParser Parse GenBank data in SeqRecord and SeqFeature objects. RecordParser Parse GenBank data into a Record object. Exceptions: ParserFailureError Exception indicating a failure in the parser (ie. scanner or consumer) LocationParserError Exception indiciating a problem with the spark based location parser. 17-MAR-2009: added wgs, wgs_scafld for GenBank whole genome shotgun master records. These are GenBank files that summarize the content of a project, and provide lists of scaffold and contig files in the project. These will be in annotations['wgs'] and annotations['wgs_scafld']. These GenBank files do not have sequences. See http://groups.google.com/group/bionet.molbio.genbank/browse_thread/thread/51fb88bf39e7dc36 http://is.gd/nNgk for more details of this format, and an example. Added by Ying Huang & Iddo Friedberg
|
|||
|
|
|||
Iterator Iterator interface to move over a file of GenBank entries one at a time. |
|||
ParserFailureError Failure caused by some kind of problem in the parser. |
|||
LocationParserError Could not Properly parse out a location from a GenBank file. |
|||
FeatureParser Parse GenBank files into Seq + Feature objects. |
|||
RecordParser Parse GenBank files into Record objects |
|||
_BaseGenBankConsumer Abstract GenBank consumer providing useful general functions. |
|||
_FeatureConsumer Create a SeqRecord object with Features to return. |
|||
_RecordConsumer Create a GenBank Record object from scanner generated information. |
|
|||
|
|||
|
|||
|
|||
|
|
|||
GENBANK_INDENT = 12
|
|||
GENBANK_SPACER =
|
|||
FEATURE_KEY_INDENT = 5
|
|||
FEATURE_QUALIFIER_INDENT = 21
|
|||
FEATURE_KEY_SPACER =
|
|||
FEATURE_QUALIFIER_SPACER =
|
|||
_solo_location =
|
|||
_pair_location =
|
|||
_between_location =
|
|||
_within_position =
|
|||
_re_within_position = re.compile(r'\(\d
|
|||
_within_location =
|
|||
_oneof_position =
|
|||
_re_oneof_position = re.compile(r'one-of\(\d
|
|||
_oneof_location =
|
|||
_simple_location =
|
|||
_re_simple_location = re.compile(r'\d
|
|||
_re_simple_compound = re.compile(r'^
|
|||
_complex_location =
|
|||
_re_complex_location = re.compile(r'^
|
|||
_possibly_complemented_complex_location =
|
|||
_re_complex_compound = re.compile(r'^
|
|||
__package__ =
|
|
Build a Position object (PRIVATE). For an end position, leave offset as zero (default): >>> _pos("5") ExactPosition(5) For a start position, set offset to minus one (for Python counting): >>> _pos("5", -1) ExactPosition(4) This also covers fuzzy positions: >>> _pos("<5") BeforePosition(5) >>> _pos(">5") AfterPosition(5) >>> _pos("one-of(5,8,11)") OneOfPosition([ExactPosition(5), ExactPosition(8), ExactPosition(11)]) >>> _pos("(8.10)") WithinPosition(8,2) |
FeatureLocation from non-compound non-complement location (PRIVATE). Simple examples, >>> _loc("123..456", 1000) FeatureLocation(ExactPosition(122),ExactPosition(456)) >>> _loc("<123..>456", 1000) FeatureLocation(BeforePosition(122),AfterPosition(456)) A more complex location using within positions, >>> _loc("(9.10)..(20.25)", 1000) FeatureLocation(WithinPosition(8,1),WithinPosition(20,5)) Zero length between feature, >>> _loc("123^124", 1000) FeatureLocation(ExactPosition(123),ExactPosition(123)) The expected sequence length is needed for a special case, a between position at the start/end of a circular genome: >>> _loc("1000^1", 1000) FeatureLocation(ExactPosition(1000),ExactPosition(1000)) Apart from this special case, between positions P^Q must have P+1==Q, >>> _loc("123^456", 1000) Traceback (most recent call last): ... ValueError: Invalid between location '123^456' |
Split a tricky compound location string (PRIVATE). >>> list(_split_compound_loc("123..145")) ['123..145'] >>> list(_split_compound_loc("123..145,200..209")) ['123..145', '200..209'] >>> list(_split_compound_loc("one-of(200,203)..300")) ['one-of(200,203)..300'] >>> list(_split_compound_loc("complement(123..145),200..209")) ['complement(123..145)', '200..209'] >>> list(_split_compound_loc("123..145,one-of(200,203)..209")) ['123..145', 'one-of(200,203)..209'] >>> list(_split_compound_loc("123..145,one-of(200,203)..one-of(209,211),300")) ['123..145', 'one-of(200,203)..one-of(209,211)', '300'] >>> list(_split_compound_loc("123..145,complement(one-of(200,203)..one-of(209,211)),300")) ['123..145', 'complement(one-of(200,203)..one-of(209,211))', '300'] >>> list(_split_compound_loc("123..145,200..one-of(209,211),300")) ['123..145', '200..one-of(209,211)', '300'] >>> list(_split_compound_loc("123..145,200..one-of(209,211)")) ['123..145', '200..one-of(209,211)'] |
|
_within_location
|
_oneof_location
|
_re_simple_compound
|
_complex_location
|
_re_complex_location
|
_possibly_complemented_complex_location
|
_re_complex_compound
|
Trees | Indices | Help |
---|
Generated by Epydoc 3.0.1 on Sat Aug 20 10:37:28 2011 | http://epydoc.sourceforge.net |