BSgenome-utils {BSgenome}R Documentation

BSgenome utilities

Description

Utilities for BSgenome objects.

Usage

  ## S4 method for signature 'BSgenome'
matchPWM(pwm, subject, min.score = "80%", exclude = "",
         maskList = logical(0), asRangedData = TRUE)
  ## S4 method for signature 'BSgenome'
countPWM(pwm, subject, min.score = "80%", exclude = "", 
         maskList = logical(0))
  ## S4 method for signature 'BSgenome'
vmatchPattern(pattern, subject, max.mismatch = 0, min.mismatch = 0,
              with.indels = FALSE, fixed = TRUE, algorithm = "auto",
              exclude = "", maskList = logical(0),  userMask =
                 RangesList(), invertUserMask = FALSE, asRangedData = TRUE)
  ## S4 method for signature 'BSgenome'
vcountPattern(pattern, subject, max.mismatch = 0, min.mismatch = 0,
              with.indels = FALSE, fixed = TRUE, algorithm = "auto",
              exclude = "", maskList = logical(0),  userMask =
                 RangesList(), invertUserMask = FALSE)
  ## S4 method for signature 'BSgenome'
vmatchPDict(pdict, subject, max.mismatch = 0, min.mismatch = 0,
            fixed = TRUE, algorithm = "auto", verbose = FALSE,
            exclude = "", maskList = logical(0), asRangedData = TRUE)
  ## S4 method for signature 'BSgenome'
vcountPDict(pdict, subject, max.mismatch = 0, min.mismatch = 0,
            fixed = TRUE, algorithm = "auto", collapse = FALSE,
            weight = 1L, verbose = FALSE, exclude = "", maskList = logical(0))

Arguments

pwm A numeric matrix with row names A, C, G and T representing a Position Weight Matrix.
pattern A DNAString object containing the pattern sequence.
pdict A DNAStringSet object containing the pattern sequences.
subject A BSgenome object containing the subject sequences.
min.score The minimum score for counting a match. Can be given as a character string containing a percentage (e.g. "85%") of the highest possible score or as a single number.
max.mismatch, min.mismatch The maximum and minimum number of mismatching letters allowed (see ?`lowlevel-matching` for the details). If non-zero, an inexact matching algorithm is used.
with.indels If TRUE then indels are allowed. In that case, min.mismatch must be 0 and max.mismatch is interpreted as the maximum "edit distance" allowed between the pattern and a match. Note that in order to avoid pollution by redundant matches, only the "best local matches" are returned. Roughly speaking, a "best local match" is a match that is locally both the closest (to the pattern P) and the shortest. More precisely, a substring S' of the subject S is a "best local match" iff:
       (a) nedit(P, S') <= max.mismatch
       (b) for every substring S1 of S':
               nedit(P, S1) > nedit(P, S')
       (c) for every substring S2 of S that contains S':
               nedit(P, S2) <= nedit(P, S')
    
One nice property of "best local matches" is that their first and last letters are guaranteed to be aligned with letters in P (i.e. they match letters in P).
fixed If FALSE then IUPAC extended letters are interpreted as ambiguities (see ?`lowlevel-matching` for the details).
algorithm For vmatchPattern and vcountPattern one of the following: "auto", "naive-exact", "naive-inexact", "boyer-moore", "shift-or", or "indels".

For vmatchPDict and vcountPDict one of the following: "auto", "naive-exact", "naive-inexact", "boyer-moore", or "shift-or".

collapse, weight ignored arguments.
verbose TRUE or FALSE.
exclude A character vector with strings that will be used to filter out chromosomes whose names match these strings.
maskList A named logical vector of maskStates preferred when used with a BSGenome object. When using the bsapply function, the masks will be set to the states in this vector.
userMask A RangesList, containing a mask to be applied to each chromosome. See bsapply.
invertUserMask Whether the userMask should be inverted.
asRangedData A logical value to assist in migrating output type from RangedData (deprecated) to GRanges. Should be FALSE. If TRUE, a warning message is issued and a RangedData object is returned.

Value

A GRanges object for matchPWM with two elementMetadata columns: "score" (numeric), and "string" (DNAStringSet).

A GRanges object for vmatchPattern.

A GRanges object for vmatchPDict with one elementMetadata column: "index", which represents a mapping to a position in the original pattern dictionary.

A data.frame object for countPWM and vcountPattern with three columns: "seqname" (factor), "strand" (factor), and "count" (integer).

A DataFrame object for vcountPDict with four columns: "seqname" ('factor' Rle), "strand" ('factor' Rle), "index" (integer) and "count" ('integer' Rle). As with vmatchPDict the index column represents a mapping to a position in the original pattern dictionary.

Author(s)

P. Aboyoun

See Also

matchPWM, matchPattern, matchPDict, bsapply

Examples

  library(BSgenome.Celegans.UCSC.ce2)
  data(HNF4alpha)

  pwm <- PWM(HNF4alpha)
  matchPWM(pwm, Celegans, asRangedData = FALSE)
  countPWM(pwm, Celegans)

  pattern <- consensusString(HNF4alpha)
  vmatchPattern(pattern, Celegans, fixed = "subject", asRangedData = FALSE)
  vcountPattern(pattern, Celegans, fixed = "subject")

  vmatchPDict(HNF4alpha[1:10], Celegans, asRangedData = FALSE)
  vcountPDict(HNF4alpha[1:10], Celegans)

[Package BSgenome version 1.18.3 Index]