module Xml_lexer:sig
..end
Simple XML lexer
This module provides an ocamllex
lexer for XML files. It only
supports the most basic features of the XML specification.
The lexer altogether ignores the following 'events': comments, processing instructions, XML prolog and doctype declaration.
The predefined entities (&
, <
, etc.) are supported. The
replacement text for other entities whose entity value consist of
character data can be provided to the lexer (see
Xml_lexer.entities
). Internal entities declarations are not
taken into account (the lexer just skips the doctype declaration).
CDATA
sections and character references are supported.
See Xml_lexer.strip_ws
about whitespace handling.
type
error =
| |
Illegal_character of |
| |
Bad_entity of |
| |
Unterminated of |
| |
Tag_expected |
| |
Attribute_expected |
| |
Other of |
val error_string : error -> string
exception Error of error * int
This exception is raised in case of an error during the
parsing. The int
argument indicates the character position in
the buffer. Note that some non-conforming XML documents might not
trigger an error.
type
token =
| |
Tag of |
(* |
| *) |
| |
Chars of |
(* | Some text between the tags | *) |
| |
Endtag of |
(* | A closing tag | *) |
| |
EOF |
(* | End of input | *) |
The type of the XML document elements
val strip_ws : bool Pervasives.ref
Whitespace handling: if strip_ws
is true
(the default),
whitespaces next to a tag are ignored. Character data consisting
only of whitespaces is thus suppressed (i.e. Chars ""
tokens are
skipped).
val entities : (string * string) list Pervasives.ref
An association list of entities definitions. Initially, it
contains the predefined entities ( ["amp", "&"; "lt", "<" ...]
).
val token : Lexing.lexbuf -> token
The entry point of the lexer.
Error
in case of an invalid XML document