#include <stsearch.h>
Inheritance diagram for StringSearch:
Public Member Functions | |
StringSearch (const UnicodeString &pattern, const UnicodeString &text, const Locale &locale, BreakIterator *breakiter, UErrorCode &status) | |
Creating a StringSearch instance using the argument locale language rule set. | |
StringSearch (const UnicodeString &pattern, const UnicodeString &text, RuleBasedCollator *coll, BreakIterator *breakiter, UErrorCode &status) | |
Creating a StringSearch instance using the argument collator language rule set. | |
StringSearch (const UnicodeString &pattern, CharacterIterator &text, const Locale &locale, BreakIterator *breakiter, UErrorCode &status) | |
Creating a StringSearch instance using the argument locale language rule set. | |
StringSearch (const UnicodeString &pattern, CharacterIterator &text, RuleBasedCollator *coll, BreakIterator *breakiter, UErrorCode &status) | |
Creating a StringSearch instance using the argument collator language rule set. | |
StringSearch (const StringSearch &that) | |
Copy constructor that creates a StringSearch instance with the same behavior, and iterating over the same text. | |
virtual | ~StringSearch (void) |
Destructor. | |
StringSearch * | clone () const |
Clone this object. | |
StringSearch & | operator= (const StringSearch &that) |
Assignment operator. | |
virtual UBool | operator== (const SearchIterator &that) const |
Equality operator. | |
virtual void | setOffset (int32_t position, UErrorCode &status) |
Sets the index to point to the given position, and clears any state that's affected. | |
virtual int32_t | getOffset (void) const |
Return the current index in the text being searched. | |
virtual void | setText (const UnicodeString &text, UErrorCode &status) |
Set the target text to be searched. | |
virtual void | setText (CharacterIterator &text, UErrorCode &status) |
Set the target text to be searched. | |
RuleBasedCollator * | getCollator () const |
Gets the collator used for the language rules. | |
void | setCollator (RuleBasedCollator *coll, UErrorCode &status) |
Sets the collator used for the language rules. | |
void | setPattern (const UnicodeString &pattern, UErrorCode &status) |
Sets the pattern used for matching. | |
const UnicodeString & | getPattern () const |
Gets the search pattern. | |
virtual void | reset () |
Reset the iteration. | |
virtual SearchIterator * | safeClone (void) const |
Returns a copy of StringSearch with the same behavior, and iterating over the same text, as this one. | |
virtual UClassID | getDynamicClassID () const |
ICU "poor man's RTTI", returns a UClassID for the actual class. | |
Static Public Member Functions | |
static UClassID | getStaticClassID () |
ICU "poor man's RTTI", returns a UClassID for this class. | |
Protected Member Functions | |
virtual int32_t | handleNext (int32_t position, UErrorCode &status) |
Search forward for matching text, starting at a given location. | |
virtual int32_t | handlePrev (int32_t position, UErrorCode &status) |
Search backward for matching text, starting at a given location. |
StringSearch
is a SearchIterator
that provides language-sensitive text searching based on the comparison rules defined in a RuleBasedCollator object. StringSearch ensures that language eccentricity can be handled, e.g. for the German collator, characters ß and SS will be matched if case is chosen to be ignored. See the "ICU Collation Design Document" for more information. The algorithm implemented is a modified form of the Boyer Moore's search. For more information see "Efficient Text Searching in Java", published in Java Report in February, 1999, for further information on the algorithm.
There are 2 match options for selection:
Let S' be the sub-string of a text string S between the offsets start and end <start, end>.
A pattern string P matches a text string S at the offsets <start, end> if
option 1. Some canonical equivalent of P matches some canonical equivalent of S' option 2. P matches S' and if P starts or ends with a combining mark, there exists no non-ignorable combining mark before or after S? in S respectively.Option 2. will be the default.
This search has APIs similar to that of other text iteration mechanisms such as the break iterators in BreakIterator
. Using these APIs, it is easy to scan through text looking for all occurances of a given pattern. This search iterator allows changing of direction by calling a reset
followed by a next
or previous
. Though a direction change can occur without calling reset
first, this operation comes with some speed penalty. Match results in the forward direction will match the result matches in the backwards direction in the reverse order
SearchIterator
provides APIs to specify the starting position within the text string to be searched, e.g. setOffset
, preceding
and following
. Since the starting position will be set as it is specified, please take note that there are some danger points which the search may render incorrect results:
A breakiterator can be used if only matches at logical breaks are desired. Using a breakiterator will only give you results that exactly matches the boundaries given by the breakiterator. For instance the pattern "e" will not be found in the string "\u00e9" if a character break iterator is used.
Options are provided to handle overlapping matches. E.g. In English, overlapping matches produces the result 0 and 2 for the pattern "abab" in the text "ababab", where else mutually exclusive matches only produce the result of 0.
Though collator attributes will be taken into consideration while performing matches, there are no APIs here for setting and getting the attributes. These attributes can be set by getting the collator from getCollator
and using the APIs in coll.h
. Lastly to update StringSearch to the new collator attributes, reset() has to be called.
Restriction:
Currently there are no composite characters that consists of a character with combining class > 0 before a character with combining class == 0. However, if such a character exists in the future, StringSearch does not guarantee the results for option 1.
Consult the SearchIterator
documentation for information on and examples of how to use instances of this class to implement text searching.
UnicodeString target("The quick brown fox jumped over the lazy fox");
UnicodeString pattern("fox");
SearchIterator *iter = new StringSearch(pattern, target);
UErrorCode error = U_ZERO_ERROR;
for (int pos = iter->first(error); pos != USEARCH_DONE;
pos = iter->next(error)) {
printf("Found match at %d pos, length is %d\n", pos,
iter.getMatchLength());
}
Note, StringSearch is not to be subclassed.
Definition at line 136 of file stsearch.h.
|
Creating a A collator will be created in the process, which will be owned by this instance and will be deleted during destruction
|
|
Creating a Note, user retains the ownership of this collator, it does not get destroyed during this instance's destruction.
|
|
Creating a A collator will be created in the process, which will be owned by this instance and will be deleted during destruction
Note: No parsing of the text within the
|
|
Creating a Note, user retains the ownership of this collator, it does not get destroyed during this instance's destruction.
Note: No parsing of the text within the
|
|
Copy constructor that creates a StringSearch instance with the same behavior, and iterating over the same text.
|
|
Destructor. Cleans up the search iterator data struct. If a collator is created in the constructor, it will be destroyed here.
|
|
Clone this object. Clones can be used concurrently in multiple threads. If an error occurs, then NULL is returned. The caller must delete the clone.
|
|
Gets the collator used for the language rules.
Caller may modify but must not delete the
|
|
ICU "poor man's RTTI", returns a UClassID for the actual class.
Implements UObject. |
|
Return the current index in the text being searched. If the iteration has gone past the end of the text (or past the beginning for a backwards search), USEARCH_DONE is returned.
Implements SearchIterator. |
|
Gets the search pattern.
|
|
ICU "poor man's RTTI", returns a UClassID for this class.
|
|
Search forward for matching text, starting at a given location. Clients should not call this method directly; instead they should call SearchIterator#next.
If a match is found, this method returns the index at which the match starts and calls SearchIterator#setMatchLength with the number of characters in the target text that make up the match. If no match is found, the method returns
The
Implements SearchIterator. |
|
Search backward for matching text, starting at a given location.
Clients should not call this method directly; instead they should call
If a match is found, this method returns the index at which the match starts and calls SearchIterator#setMatchLength with the number of characters in the target text that make up the match. If no match is found, the method returns
The
Implements SearchIterator. |
|
Assignment operator. Sets this iterator to have the same behavior, and iterate over the same text, as the one passed in.
|
|
Equality operator.
Reimplemented from SearchIterator. |
|
Reset the iteration. Search will begin at the start of the text string if a forward iteration is initiated before a backwards iteration. Otherwise if a backwards iteration is initiated before a forwards iteration, the search will begin at the end of the text string.
Reimplemented from SearchIterator. |
|
Returns a copy of StringSearch with the same behavior, and iterating over the same text, as this one. Note that all data will be replicated, except for the user-specified collator and the breakiterator.
Implements SearchIterator. |
|
Sets the collator used for the language rules. User retains the ownership of this collator, thus the responsibility of deletion lies with the user. This method causes internal data such as Boyer-Moore shift tables to be recalculated, but the iterator's position is unchanged.
|
|
Sets the index to point to the given position, and clears any state that's affected. This method takes the argument index and sets the position in the text string accordingly without checking if the index is pointing to a valid starting point to begin searching.
Implements SearchIterator. |
|
Sets the pattern used for matching. Internal data like the Boyer Moore table will be recalculated, but the iterator's position is unchanged.
|
|
Set the target text to be searched.
Text iteration will hence begin at the start of the text string. This method is useful if you want to re-use an iterator to search for the same pattern within a different body of text. Note: No parsing of the text within the
Reimplemented from SearchIterator. |
|
Set the target text to be searched. Text iteration will hence begin at the start of the text string. This method is useful if you want to re-use an iterator to search for the same pattern within a different body of text.
Reimplemented from SearchIterator. |