org.apache.lucene.analysis

Class LowerCaseTokenizer


public final class LowerCaseTokenizer
extends LetterTokenizer

LowerCaseTokenizer performs the function of LetterTokenizer and LowerCaseFilter together. It divides text at non-letters and converts them to lower case. While it is functionally equivalent to the combination of LetterTokenizer and LowerCaseFilter, there is a performance advantage to doing the two tasks at once, hence this (redundant) implementation.

Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.

Field Summary

Fields inherited from class org.apache.lucene.analysis.Tokenizer

input

Constructor Summary

LowerCaseTokenizer(Reader in)
Construct a new LowerCaseTokenizer.

Method Summary

protected char
normalize(char c)
Collects only characters which satisfy Character.isLetter(char).

Methods inherited from class org.apache.lucene.analysis.LetterTokenizer

isTokenChar

Methods inherited from class org.apache.lucene.analysis.CharTokenizer

isTokenChar, next, normalize

Methods inherited from class org.apache.lucene.analysis.Tokenizer

close

Methods inherited from class org.apache.lucene.analysis.TokenStream

close, next

Constructor Details

LowerCaseTokenizer

public LowerCaseTokenizer(Reader in)
Construct a new LowerCaseTokenizer.

Method Details

normalize

protected char normalize(char c)
Collects only characters which satisfy Character.isLetter(char).
Overrides:
normalize in interface CharTokenizer


Copyright © 2000-2005 Apache Software Foundation. All Rights Reserved.