Class WhitespaceTokenizer

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable

    public final class WhitespaceTokenizer
    extends CharTokenizer
    A tokenizer that divides text at whitespace characters as defined by Character.isWhitespace(int). Note: That definition explicitly excludes the non-breaking space. Adjacent sequences of non-Whitespace characters form tokens.
    See Also:
    UnicodeWhitespaceTokenizer
    • Constructor Detail

      • WhitespaceTokenizer

        public WhitespaceTokenizer()
        Construct a new WhitespaceTokenizer.
      • WhitespaceTokenizer

        public WhitespaceTokenizer​(AttributeFactory factory)
        Construct a new WhitespaceTokenizer using a given AttributeFactory.
        Parameters:
        factory - the attribute factory to use for this Tokenizer
      • WhitespaceTokenizer

        public WhitespaceTokenizer​(AttributeFactory factory,
                                   int maxTokenLen)
        Construct a new WhitespaceTokenizer using a given AttributeFactory.
        Parameters:
        factory - the attribute factory to use for this Tokenizer
        maxTokenLen - maximum token length the tokenizer will emit. Must be greater than 0 and less than MAX_TOKEN_LENGTH_LIMIT (1024*1024)
        Throws:
        java.lang.IllegalArgumentException - if maxTokenLen is invalid.
    • Method Detail

      • isTokenChar

        protected boolean isTokenChar​(int c)
        Collects only characters which do not satisfy Character.isWhitespace(int).
        Specified by:
        isTokenChar in class CharTokenizer