class Orgmode::RegexpHelper

Summary

This class contains helper routines to deal with the Regexp “black magic” you need to properly parse org-mode files.

Key methods

Attributes

org_image_file_regexp[R]

EMPHASIS

I figure it's best to stick as closely to the elisp implementation as possible for emphasis. org.el defines the regular expression that is used to apply “emphasis” (in my terminology, inline formatting instead of block formatting). Here's the documentation from org.el.

Terminology: In an emphasis string like “ *strong word* ”, we call the initial space PREMATCH, the final space POSTMATCH, the stars MARKERS, “s” and “d” are BORDER characters and “trong wor” is the body. The different components in this variable specify what is allowed/forbidden in each part:

pre Chars allowed as prematch. Line beginning allowed, too. post Chars allowed as postmatch. Line end will be allowed too. border The chars forbidden as border characters. body-regexp A regexp like "." to match a body character. Don't use

non-shy groups here, and don't allow newline here.

newline The maximum number of newlines allowed in an emphasis exp.

Public Class Methods

new() click to toggle source
# File lib/org-ruby/regexp_helper.rb, line 45
def initialize
  # Set up the emphasis regular expression.
  @pre_emphasis = ' \t\(\"\{'
  @post_emphasis = '- \t\.,:!\?;\"\)\}\\'
  @border_forbidden = '\s,"\'
  @body_regexp = '.*?'
  @max_newlines = 1
  @body_regexp = "#{@body_regexp}" +
                 "(?:\\n#{@body_regexp}){0,#{@max_newlines}}" if @max_newlines > 0
  @markers = '\*\/_=~\+'
  @code_snippet_stack = []
  @logger = Logger.new(STDERR)
  @logger.level = Logger::WARN
  build_org_emphasis_regexp
  build_org_link_regexp
  @org_subp_regexp = /([_^])\{(.*?)\}/
  @org_footnote_regexp = /\[fn:(.+?)(:(.*?))?\]/
end

Public Instance Methods

match_all(str) { |$2, $3| ... } click to toggle source

Finds all emphasis matches in a string. Supply a block that will get the marker and body as parameters.

# File lib/org-ruby/regexp_helper.rb, line 66
def match_all(str)
  str.scan(@org_emphasis_regexp) do |match|
    yield $2, $3
  end
end
restore_code_snippets(str) click to toggle source
# File lib/org-ruby/regexp_helper.rb, line 163
def restore_code_snippets str
  str = str % @code_snippet_stack
  @code_snippet_stack = []
  str
end
rewrite_emphasis(str) { |$2, $3| ... } click to toggle source

Compute replacements for all matching emphasized phrases. Supply a block that will get the marker and body as parameters; return the replacement string from your block.

Example

re = RegexpHelper.new
result = re.rewrite_emphasis("*bold*, /italic/, =code=") do |marker, body|
    "<#{map[marker]}>#{body}</#{map[marker]}>"
end

In this example, the block body will get called three times:

  1. Marker: “*”, body: “bold”

  2. Marker: “/”, body: “italic”

  3. Marker: “=”, body: “code”

The return from this block is a string that will be used to replace “bold”, “/italic/”, and “=code=”, respectively. (Clearly this sample string will use HTML-like syntax, assuming map is defined appropriately.)

# File lib/org-ruby/regexp_helper.rb, line 93
def rewrite_emphasis str
  # escape the percent signs for safe restoring code snippets
  str.gsub!(/%/, "%%")
  format_str = "%s"
  str.gsub! @org_emphasis_regexp do |match|
    pre = $1
    # preserve the code snippet from further formatting
    if $2 == "=" or $2 == "~"
      inner = yield $2, $3
      # code is not formatted, so turn to single percent signs
      inner.gsub!(/%%/, "%")
      @code_snippet_stack.push inner
      "#{pre}#{format_str}"
    else
      inner = yield $2, $3
      "#{pre}#{inner}"
    end
  end
end
rewrite_footnote(str) { |name, definition or nil| ... } click to toggle source

rewrite footnotes

# File lib/org-ruby/regexp_helper.rb, line 121
def rewrite_footnote str # :yields: name, definition or nil
  str.gsub! @org_footnote_regexp do |match|
    yield $1, $3
  end
end
rewrite_subp(str) { |type ("_" for subscript and "^" for superscript), text| ... } click to toggle source

rewrite subscript and superscript (_{foo} and ^{bar})

# File lib/org-ruby/regexp_helper.rb, line 114
def rewrite_subp str # :yields: type ("_" for subscript and "^" for superscript), text
  str.gsub! @org_subp_regexp do |match|
    yield $1, $2
  end
end

Private Instance Methods

build_org_emphasis_regexp() click to toggle source
# File lib/org-ruby/regexp_helper.rb, line 171
def build_org_emphasis_regexp
  @org_emphasis_regexp = Regexp.new("([#{@pre_emphasis}]|^)" +
                                    "([#{@markers}])(?!\\2)" +
                                    "([^#{@border_forbidden}]|" +
                                    "[^#{@border_forbidden}]#{@body_regexp}" +
                                    "[^#{@border_forbidden}])\\2" +
                                    "(?=[#{@post_emphasis}]|$)")
  @logger.debug "Just created regexp: #{@org_emphasis_regexp}"
end