net.sf.saxon.expr

Class Tokenizer

final class Tokenizer extends Object

Tokenizer for expressions and inputs. This code was originally derived from James Clark's xt, though it has been greatly modified since. See copyright notice at end of file.
Field Summary
static intBARE_NAME_STATE
State in which a name is NOT to be merged with what comes next, for example "("
intcurrentToken
The number identifying the most recently read token
intcurrentTokenStartOffset
The position in the input expression where the current token starts
StringcurrentTokenValue
The string value of the most recently read token
static intDEFAULT_STATE
Initial default state of the Tokenizer
Stringinput
The string being parsed
intinputOffset
The current position within the input string
static intOPERATOR_STATE
State in which the next thing to be read is an operator
intstartLineNumber
The starting line number (for XPath in XSLT, the line number in the stylesheet)
static intSEQUENCE_TYPE_STATE
State in which the next thing to be read is a SequenceType
Method Summary
intgetColumnNumber()
Get the column number of the current token
intgetColumnNumber(int offset)
Return the column number corresponding to a given offset in the expression
longgetLineAndColumn(int offset)
Get the line and column number corresponding to a given offset in the input expression, as a long value with the line number in the top half and the column number in the lower half
intgetLineNumber()
Get the line number of the current token
intgetLineNumber(int offset)
Return the line number corresponding to a given offset in the expression
intgetState()
Get the current tokenizer state
voidincrementLineNumber(int offset)
Increment the line number, making a record of where in the input string the newline character occurred.
voidlookAhead()
Look ahead by one token.
voidnext()
Get the next token from the input expression.
charnextChar()
Read next character directly.
StringrecentText()
Get the most recently read text (for use in an error message)
voidsetState(int state)
Set the tokenizer into a special state
voidtokenize(String input, int start, int end, int lineNumber)
Prepare a string for tokenization.
voidtreatCurrentAsOperator()
Force the current token to be treated as an operator if possible
voidunreadChar()
Step back one character.

Field Detail

BARE_NAME_STATE

public static final int BARE_NAME_STATE
State in which a name is NOT to be merged with what comes next, for example "("

currentToken

public int currentToken
The number identifying the most recently read token

currentTokenStartOffset

public int currentTokenStartOffset
The position in the input expression where the current token starts

currentTokenValue

public String currentTokenValue
The string value of the most recently read token

DEFAULT_STATE

public static final int DEFAULT_STATE
Initial default state of the Tokenizer

input

public String input
The string being parsed

inputOffset

public int inputOffset
The current position within the input string

OPERATOR_STATE

public static final int OPERATOR_STATE
State in which the next thing to be read is an operator

startLineNumber

public int startLineNumber
The starting line number (for XPath in XSLT, the line number in the stylesheet)

SEQUENCE_TYPE_STATE

public static final int SEQUENCE_TYPE_STATE
State in which the next thing to be read is a SequenceType

Method Detail

getColumnNumber

public int getColumnNumber()
Get the column number of the current token

Returns: the column number

getColumnNumber

public int getColumnNumber(int offset)
Return the column number corresponding to a given offset in the expression

Parameters: offset the byte offset in the expression

Returns: the column number

getLineAndColumn

public long getLineAndColumn(int offset)
Get the line and column number corresponding to a given offset in the input expression, as a long value with the line number in the top half and the column number in the lower half

Parameters: offset the byte offset in the expression

Returns: the line and column number, packed together

getLineNumber

public int getLineNumber()
Get the line number of the current token

Returns: the line number

getLineNumber

public int getLineNumber(int offset)
Return the line number corresponding to a given offset in the expression

Parameters: offset the byte offset in the expression

Returns: the line number

getState

public int getState()
Get the current tokenizer state

Returns: the current state

incrementLineNumber

public void incrementLineNumber(int offset)
Increment the line number, making a record of where in the input string the newline character occurred.

Parameters: offset the place in the input string where the newline occurred

lookAhead

public void lookAhead()
Look ahead by one token. This method does the real tokenization work. The method is normally called internally, but the XQuery parser also calls it to resume normal tokenization after dealing with pseudo-XML syntax.

Throws: XPathException if a lexical error occurs

next

public void next()
Get the next token from the input expression. The type of token is returned in the currentToken variable, the string value of the token in currentTokenValue.

Throws: XPathException if a lexical error is detected

nextChar

public char nextChar()
Read next character directly. Used by the XQuery parser when parsing pseudo-XML syntax

Returns: the next character from the input

Throws: StringIndexOutOfBoundsException if an attempt is made to read beyond the end of the string. This will only occur in the event of a syntax error in the input.

recentText

public String recentText()
Get the most recently read text (for use in an error message)

Returns: a chunk of text leading up to the error

setState

public void setState(int state)
Set the tokenizer into a special state

Parameters: state the new state

tokenize

public void tokenize(String input, int start, int end, int lineNumber)
Prepare a string for tokenization. The actual tokens are obtained by calls on next()

Parameters: input the string to be tokenized start start point within the string end end point within the string (last character not read): -1 means end of string lineNumber the linenumber in the source where the expression appears

Throws: XPathException if a lexical error occurs, e.g. unmatched string quotes

treatCurrentAsOperator

public void treatCurrentAsOperator()
Force the current token to be treated as an operator if possible

unreadChar

public void unreadChar()
Step back one character. If this steps back to a previous line, adjust the line number.