trimLRPatterns {Biostrings} | R Documentation |
The trimLRPatterns
function trims left and/or right flanking patterns
from sequences.
trimLRPatterns(Lpattern = "", Rpattern = "", subject, max.Lmismatch = 0, max.Rmismatch = 0, with.Lindels = FALSE, with.Rindels = FALSE, Lfixed = TRUE, Rfixed = TRUE, ranges = FALSE)
Lpattern |
The left part of the pattern. |
Rpattern |
The right part of the pattern. |
subject |
An XString or XStringSet object containing the target sequence(s). |
max.Lmismatch |
Either an integer vector of length nLp = nchar(Lpattern) whose
elements max.Lmismatch[i] represent the maximum number of acceptable
mismatching letters when aligning
substring(Lpattern, nLp - i + 1, nLp) with
substring(subject, 1, i) or a single numeric value in (0, 1)
that represents a constant maximum mismatch rate for each of the nL
alignments. Negative numbers in integer vector inputs are used to prevent
trimming at the i-th location. If an integer vector input has
length(max.Lmismatch) < nLp , then max.Lmismatch will be
augmented with enough -1's at the beginning of the vector to bring it up to
length nLp .
If non-zero, an inexact matching algorithm is used (see the matchPattern function for more information).
|
max.Rmismatch |
Either an integer vector of length nRp = nchar(Rpattern) whose
elements max.Rmismatch[i] represent the maximum number of acceptable
mismatching letters when aligning
substring(Rpattern, nRp - i + 1, nRp) with
substring(subject, 1, i) or a single numeric value in (0, 1)
that represents a constant maximum mismatch rate for each of the nR
alignments. Negative numbers in integer vector inputs are used to prevent
trimming at the i-th location. If an integer vector input has
length(max.Rmismatch) < nRp , then max.Rmismatch will be
augmented with enough -1's at the beginning of the vector to bring it up to
length nRp .
If non-zero, an inexact matching algorithm is used (see the matchPattern function for more information).
|
with.Lindels |
If TRUE then indels are allowed in the left part of the pattern.
In that case max.Lmismatch is interpreted as the maximum "edit
distance" allowed in the left part of the pattern.
See the with.indels argument of the matchPattern
function for more information.
|
with.Rindels |
Same as with.Lindels but for the right part of the pattern.
|
Lfixed |
Only with a DNAString or RNAString subject can a
Lfixed value other than the default (TRUE ) be used.
With Lfixed=FALSE , ambiguities (i.e. letters from the IUPAC Extended
Genetic Alphabet (see IUPAC_CODE_MAP ) that are not from the
base alphabet) in the left pattern _and_ in the subject are interpreted
as wildcards i.e. they match any letter that they stand for.
See the fixed argument of the matchPattern function
for more information.
|
Rfixed |
Same as Lfixed but for the right part of the pattern.
|
ranges |
If TRUE , then return the ranges to use to trim subject .
If FALSE , then returned the trimmed subject .
|
A new XString or XStringSet object with the flanking patterns within the specified edit distances removed.
P. Aboyoun
matchPattern
,
matchLRPatterns
,
match-utils,
XString-class,
XStringSet-class
Lpattern <- "TTCTGCTTG" Rpattern <- "GATCGGAAG" subject <- DNAString("TTCTGCTTGACGTGATCGGA") subjectSet <- DNAStringSet(c("TGCTTGACGGCAGATCGG", "TTCTGCTTGGATCGGAAG")) ## Only allow for perfect matches on the flanks trimLRPatterns(Lpattern = Lpattern, subject = subject) trimLRPatterns(Rpattern = Rpattern, subject = subject) trimLRPatterns(Lpattern = Lpattern, Rpattern = Rpattern, subject = subjectSet) ## Allow for perfect matches on the flanking overlaps trimLRPatterns(Lpattern = Lpattern, Rpattern = Rpattern, subject = subjectSet, max.Lmismatch = rep(0, 9), max.Rmismatch = rep(0, 9)) ## Allow for mismatches on the flanks trimLRPatterns(Lpattern = Lpattern, Rpattern = Rpattern, subject = subject, max.Lmismatch = 0.2, max.Rmismatch = 0.2) maxMismatches <- as.integer(0.2 * 1:9) maxMismatches trimLRPatterns(Lpattern = Lpattern, Rpattern = Rpattern, subject = subjectSet, max.Lmismatch = maxMismatches, max.Rmismatch = maxMismatches) ## Produce ranges that can be an input into other functions trimLRPatterns(Lpattern = Lpattern, Rpattern = Rpattern, subject = subjectSet, max.Lmismatch = rep(0, 9), max.Rmismatch = rep(0, 9), ranges = TRUE) trimLRPatterns(Lpattern = Lpattern, Rpattern = Rpattern, subject = subject, max.Lmismatch = 0.2, max.Rmismatch = 0.2, ranges = TRUE)