org.apache.commons.lang.text
public class StrTokenizer extends Object implements ListIterator, Cloneable
This class can split a String into many smaller strings. It aims
to do a similar job to java.util.StringTokenizer StringTokenizer
,
however it offers much more control and flexibility including implementing
the ListIterator
interface. By default, it is setup
like StringTokenizer
.
The input String is split into a number of tokens. Each token is separated from the next String by a delimiter. One or more delimiter characters must be specified.
Each token may be surrounded by quotes. The quote matcher specifies the quote character(s). A quote may be escaped within a quoted section by duplicating itself.
Between each token and the delimiter are potentially characters that need trimming. The trimmer matcher specifies these characters. One usage might be to trim whitespace characters.
At any point outside the quotes there might potentially be invalid characters. The ignored matcher specifies these characters to be removed. One usage might be to remove new line characters.
Empty tokens may be removed or returned as null.
"a,b,c" - Three tokens "a","b","c" (comma delimiter) " a, b , c " - Three tokens "a","b","c" (default CSV processing trims whitespace) "a, ", b ,", c" - Three tokens "a, " , " b ", ", c" (quoted text untouched)
This tokenizer has the following properties and options:
Property | Type | Default |
---|---|---|
delim | CharSetMatcher | { \t\n\r\f} |
quote | NoneMatcher | {} |
ignore | NoneMatcher | {} |
emptyTokenAsNull | boolean | false |
ignoreEmptyTokens | boolean | true |
Since: 2.2
Version: $Id: StrTokenizer.java 491653 2007-01-01 22:03:58Z ggregory $
Constructor Summary | |
---|---|
StrTokenizer()
Constructs a tokenizer splitting on space, tab, newline and formfeed
as per StringTokenizer, but with no text to tokenize.
| |
StrTokenizer(String input)
Constructs a tokenizer splitting on space, tab, newline and formfeed
as per StringTokenizer.
| |
StrTokenizer(String input, char delim)
Constructs a tokenizer splitting on the specified delimiter character.
| |
StrTokenizer(String input, String delim)
Constructs a tokenizer splitting on the specified delimiter string.
| |
StrTokenizer(String input, StrMatcher delim)
Constructs a tokenizer splitting using the specified delimiter matcher.
| |
StrTokenizer(String input, char delim, char quote)
Constructs a tokenizer splitting on the specified delimiter character
and handling quotes using the specified quote character.
| |
StrTokenizer(String input, StrMatcher delim, StrMatcher quote)
Constructs a tokenizer splitting using the specified delimiter matcher
and handling quotes using the specified quote matcher.
| |
StrTokenizer(char[] input)
Constructs a tokenizer splitting on space, tab, newline and formfeed
as per StringTokenizer.
| |
StrTokenizer(char[] input, char delim)
Constructs a tokenizer splitting on the specified character.
| |
StrTokenizer(char[] input, String delim)
Constructs a tokenizer splitting on the specified string.
| |
StrTokenizer(char[] input, StrMatcher delim)
Constructs a tokenizer splitting using the specified delimiter matcher.
| |
StrTokenizer(char[] input, char delim, char quote)
Constructs a tokenizer splitting on the specified delimiter character
and handling quotes using the specified quote character.
| |
StrTokenizer(char[] input, StrMatcher delim, StrMatcher quote)
Constructs a tokenizer splitting using the specified delimiter matcher
and handling quotes using the specified quote matcher.
|
Method Summary | |
---|---|
void | add(Object obj)
Unsupported ListIterator operation. |
Object | clone()
Creates a new instance of this Tokenizer. |
String | getContent()
Gets the String content that the tokenizer is parsing.
|
static StrTokenizer | getCSVInstance()
Gets a new tokenizer instance which parses Comma Seperated Value strings
initializing it with the given input. |
static StrTokenizer | getCSVInstance(String input)
Gets a new tokenizer instance which parses Comma Seperated Value strings
initializing it with the given input. |
static StrTokenizer | getCSVInstance(char[] input)
Gets a new tokenizer instance which parses Comma Seperated Value strings
initializing it with the given input. |
StrMatcher | getDelimiterMatcher()
Gets the field delimiter matcher.
|
StrMatcher | getIgnoredMatcher()
Gets the ignored character matcher.
|
StrMatcher | getQuoteMatcher()
Gets the quote matcher currently in use.
|
String[] | getTokenArray()
Gets a copy of the full token list as an independent modifiable array.
|
List | getTokenList()
Gets a copy of the full token list as an independent modifiable list.
|
StrMatcher | getTrimmerMatcher()
Gets the trimmer character matcher.
|
static StrTokenizer | getTSVInstance()
Gets a new tokenizer instance which parses Tab Seperated Value strings.
|
static StrTokenizer | getTSVInstance(String input)
Gets a new tokenizer instance which parses Tab Seperated Value strings.
|
static StrTokenizer | getTSVInstance(char[] input)
Gets a new tokenizer instance which parses Tab Seperated Value strings.
|
boolean | hasNext()
Checks whether there are any more tokens.
|
boolean | hasPrevious()
Checks whether there are any previous tokens that can be iterated to.
|
boolean | isEmptyTokenAsNull()
Gets whether the tokenizer currently returns empty tokens as null.
|
boolean | isIgnoreEmptyTokens()
Gets whether the tokenizer currently ignores empty tokens.
|
Object | next()
Gets the next token. |
int | nextIndex()
Gets the index of the next token to return.
|
String | nextToken()
Gets the next token from the String.
|
Object | previous()
Gets the token previous to the last returned token.
|
int | previousIndex()
Gets the index of the previous token.
|
String | previousToken()
Gets the previous token from the String.
|
void | remove()
Unsupported ListIterator operation.
|
StrTokenizer | reset()
Resets this tokenizer, forgetting all parsing and iteration already completed.
|
StrTokenizer | reset(String input)
Reset this tokenizer, giving it a new input string to parse.
|
StrTokenizer | reset(char[] input)
Reset this tokenizer, giving it a new input string to parse.
|
void | set(Object obj)
Unsupported ListIterator operation. |
StrTokenizer | setDelimiterChar(char delim)
Sets the field delimiter character.
|
StrTokenizer | setDelimiterMatcher(StrMatcher delim)
Sets the field delimiter matcher.
|
StrTokenizer | setDelimiterString(String delim)
Sets the field delimiter string.
|
StrTokenizer | setEmptyTokenAsNull(boolean emptyAsNull)
Sets whether the tokenizer should return empty tokens as null.
|
StrTokenizer | setIgnoredChar(char ignored)
Set the character to ignore.
|
StrTokenizer | setIgnoredMatcher(StrMatcher ignored)
Set the matcher for characters to ignore.
|
StrTokenizer | setIgnoreEmptyTokens(boolean ignoreEmptyTokens)
Sets whether the tokenizer should ignore and not return empty tokens.
|
StrTokenizer | setQuoteChar(char quote)
Sets the quote character to use.
|
StrTokenizer | setQuoteMatcher(StrMatcher quote)
Set the quote matcher to use.
|
StrTokenizer | setTrimmerMatcher(StrMatcher trimmer)
Sets the matcher for characters to trim.
|
int | size()
Gets the number of tokens found in the String.
|
protected List | tokenize(char[] chars, int offset, int count)
Internal method to performs the tokenization.
|
String | toString()
Gets the String content that the tokenizer is parsing.
|
This constructor is normally used with reset.
Parameters: input the string which is to be parsed
Parameters: input the string which is to be parsed delim the field delimiter character
Parameters: input the string which is to be parsed delim the field delimiter string
Parameters: input the string which is to be parsed delim the field delimiter matcher
Parameters: input the string which is to be parsed delim the field delimiter character quote the field quoted string character
Parameters: input the string which is to be parsed delim the field delimiter matcher quote the field quoted string matcher
The input character array is not cloned, and must not be altered after passing in to this method.
Parameters: input the string which is to be parsed, not cloned
The input character array is not cloned, and must not be altered after passing in to this method.
Parameters: input the string which is to be parsed, not cloned delim the field delimiter character
The input character array is not cloned, and must not be altered after passing in to this method.
Parameters: input the string which is to be parsed, not cloned delim the field delimiter string
The input character array is not cloned, and must not be altered after passing in to this method.
Parameters: input the string which is to be parsed, not cloned delim the field delimiter matcher
The input character array is not cloned, and must not be altered after passing in to this method.
Parameters: input the string which is to be parsed, not cloned delim the field delimiter character quote the field quoted string character
The input character array is not cloned, and must not be altered after passing in to this method.
Parameters: input the string which is to be parsed, not cloned delim the field delimiter character quote the field quoted string character
Parameters: obj this parameter ignored.
Throws: UnsupportedOperationException always
null
.
Returns: a new instance of this Tokenizer which has been reset.
Returns: the string content being parsed
You must call a "reset" method to set the string which you want to parse.
Returns: a new tokenizer instance which parses Comma Seperated Value strings
Parameters: input the text to parse
Returns: a new tokenizer instance which parses Comma Seperated Value strings
Parameters: input the text to parse
Returns: a new tokenizer instance which parses Comma Seperated Value strings
Returns: the delimiter matcher in use
These characters are ignored when parsing the String, unless they are within a quoted region. The default value is not to ignore anything.
Returns: the ignored matcher in use
The quote character is used to wrap data between the tokens. This enables delimiters to be entered as data. The default value is '"' (double quote).
Returns: the quote matcher in use
Returns: the tokens as a String array
Returns: the tokens as a String array
These characters are trimmed off on each side of the delimiter until the token or quote is found. The default value is not to trim anything.
Returns: the trimmer matcher in use
You must call a "reset" method to set the string which you want to parse.
Returns: a new tokenizer instance which parses Tab Seperated Value strings.
Parameters: input the string to parse
Returns: a new tokenizer instance which parses Tab Seperated Value strings.
Parameters: input the string to parse
Returns: a new tokenizer instance which parses Tab Seperated Value strings.
Returns: true if there are more tokens
Returns: true if there are previous tokens
Returns: true if empty tokens are returned as null
Returns: true if empty tokens are not returned
Returns: the next String token
Returns: the next token index
Returns: the next sequential token, or null when no more tokens are found
Returns: the previous token
Returns: the previous token index
Returns: the previous sequential token, or null when no more tokens are found
Throws: UnsupportedOperationException always
This method allows the same tokenizer to be reused for the same String.
Returns: this, to enable chaining
Parameters: input the new string to tokenize, null sets no text to parse
Returns: this, to enable chaining
The input character array is not cloned, and must not be altered after passing in to this method.
Parameters: input the new character array to tokenize, not cloned, null sets no text to parse
Returns: this, to enable chaining
Parameters: obj this parameter ignored.
Throws: UnsupportedOperationException always
Parameters: delim the delimiter character to use
Returns: this, to enable chaining
The delimitier is used to separate one token from another.
Parameters: delim the delimiter matcher to use
Returns: this, to enable chaining
Parameters: delim the delimiter string to use
Returns: this, to enable chaining
Parameters: emptyAsNull whether empty tokens are returned as null
Returns: this, to enable chaining
This character is ignored when parsing the String, unless it is within a quoted region.
Parameters: ignored the ignored character to use
Returns: this, to enable chaining
These characters are ignored when parsing the String, unless they are within a quoted region.
Parameters: ignored the ignored matcher to use, null ignored
Returns: this, to enable chaining
Parameters: ignoreEmptyTokens whether empty tokens are not returned
Returns: this, to enable chaining
The quote character is used to wrap data between the tokens. This enables delimiters to be entered as data.
Parameters: quote the quote character to use
Returns: this, to enable chaining
The quote character is used to wrap data between the tokens. This enables delimiters to be entered as data.
Parameters: quote the quote matcher to use, null ignored
Returns: this, to enable chaining
These characters are trimmed off on each side of the delimiter until the token or quote is found.
Parameters: trimmer the trimmer matcher to use, null ignored
Returns: this, to enable chaining
Returns: the number of matched tokens
Most users of this class do not need to call this method. This method will be called automatically by other (public) methods when required.
This method exists to allow subclasses to add code before or after the tokenization. For example, a subclass could alter the character array, offset or count to be parsed, or call the tokenizer multiple times on multiple strings. It is also be possible to filter the results.
StrTokenizer
will always pass a zero offset and a count
equal to the length of the array to this method, however a subclass
may pass other values, or even an entirely different array.
Parameters: chars the character array being tokenized, may be null offset the start position within the character array, must be valid count the number of characters to tokenize, must be valid
Returns: the modifiable list of String tokens, unmodifiable if null array or zero count
Returns: the string content being parsed