Package morfologik.stemming
Interface ISequenceEncoder
- All Known Implementing Classes:
NoEncoder
,TrimInfixAndSuffixEncoder
,TrimPrefixAndSuffixEncoder
,TrimSuffixEncoder
public interface ISequenceEncoder
The logic of encoding one sequence of bytes relative to another sequence of
bytes. The "base" form and the "derived" form are typically the stem of
a word and the inflected form of a word.
Derived form encoding helps in making the data for the automaton smaller and more repetitive (which results in higher compression rates).
See example implementation for details.
-
Method Summary
Modifier and TypeMethodDescriptiondecode
(ByteBuffer reuse, ByteBuffer source, ByteBuffer encoded) encode
(ByteBuffer reuse, ByteBuffer source, ByteBuffer target) int
Deprecated.
-
Method Details
-
encode
- Parameters:
reuse
- Reuses the providedByteBuffer
or allocates a new one if there is not enough remaining space.source
- The source byte sequence.target
- The target byte sequence to encode relative tosource
- Returns:
- Returns the
ByteBuffer
with encodedtarget
.
-
decode
- Parameters:
reuse
- Reuses the providedByteBuffer
or allocates a new one if there is not enough remaining space.source
- The source byte sequence.encoded
- The previously encoded byte sequence.- Returns:
- Returns the
ByteBuffer
with decodedtarget
.
-
prefixBytes
Deprecated.The number of encoded form's prefix bytes that should be ignored (needed for separator lookup). An ugly workaround for GH-85, should be fixed by prior knowledge of whether the dictionary contains tags; then we can scan for separator right-to-left.- See Also:
-