Class AnalyzingInfixSuggester

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable, Accountable
    Direct Known Subclasses:
    BlendedInfixSuggester

    public class AnalyzingInfixSuggester
    extends Lookup
    implements java.io.Closeable
    Analyzes the input text and then suggests matches based on prefix matches to any tokens in the indexed text. This also highlights the tokens that match.

    This suggester supports payloads. Matches are sorted only by the suggest weight; it would be nice to support blended score + weight sort in the future. This means this suggester best applies when there is a strong a-priori ranking of all the suggestions.

    This suggester supports contexts, including arbitrary binary terms.

    • Constructor Summary

      Constructors 
      Constructor Description
      AnalyzingInfixSuggester​(Directory dir, Analyzer analyzer)
      Create a new instance, loading from a previously built AnalyzingInfixSuggester directory, if it exists.
      AnalyzingInfixSuggester​(Directory dir, Analyzer indexAnalyzer, Analyzer queryAnalyzer, int minPrefixChars, boolean commitOnBuild)
      Create a new instance, loading from a previously built AnalyzingInfixSuggester directory, if it exists.
      AnalyzingInfixSuggester​(Directory dir, Analyzer indexAnalyzer, Analyzer queryAnalyzer, int minPrefixChars, boolean commitOnBuild, boolean allTermsRequired, boolean highlight)
      Create a new instance, loading from a previously built AnalyzingInfixSuggester directory, if it exists.
      AnalyzingInfixSuggester​(Directory dir, Analyzer indexAnalyzer, Analyzer queryAnalyzer, int minPrefixChars, boolean commitOnBuild, boolean allTermsRequired, boolean highlight, boolean closeIndexWriterOnBuild)
      Create a new instance, loading from a previously built AnalyzingInfixSuggester directory, if it exists.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void add​(BytesRef text, java.util.Set<BytesRef> contexts, long weight, BytesRef payload)
      Adds a new suggestion.
      void addContextToQuery​(BooleanQuery.Builder query, BytesRef context, BooleanClause.Occur clause)
      This method is handy as we do not need access to internal fields such as CONTEXTS_FIELD_NAME in order to build queries However, here may not be its best location.
      protected void addNonMatch​(java.lang.StringBuilder sb, java.lang.String text)
      Called while highlighting a single result, to append a non-matching chunk of text from the suggestion to the provided fragments list.
      protected void addPrefixMatch​(java.lang.StringBuilder sb, java.lang.String surface, java.lang.String analyzed, java.lang.String prefixToken)
      Called while highlighting a single result, to append a matched prefix token, to the provided fragments list.
      protected void addWholeMatch​(java.lang.StringBuilder sb, java.lang.String surface, java.lang.String analyzed)
      Called while highlighting a single result, to append the whole matched token to the provided fragments list.
      void build​(InputIterator iter)
      Builds up a new internal Lookup representation based on the given InputIterator.
      private Document buildDocument​(BytesRef text, java.util.Set<BytesRef> contexts, long weight, BytesRef payload)  
      void close()  
      void commit()
      Commits all pending changes made to this suggester to disk.
      protected java.util.List<Lookup.LookupResult> createResults​(IndexSearcher searcher, TopFieldDocs hits, int num, java.lang.CharSequence charSequence, boolean doHighlight, java.util.Set<java.lang.String> matchedTokens, java.lang.String prefixToken)
      Create the results based on the search hits.
      private void ensureOpen()  
      protected Query finishQuery​(BooleanQuery.Builder in, boolean allTermsRequired)
      Subclass can override this to tweak the Query before searching.
      java.util.Collection<Accountable> getChildResources()
      Returns nested resources of this class.
      long getCount()
      Get the number of entries the lookup was built with
      protected Directory getDirectory​(java.nio.file.Path path)
      Subclass can override to choose a specific Directory implementation.
      private Analyzer getGramAnalyzer()  
      protected IndexWriterConfig getIndexWriterConfig​(Analyzer indexAnalyzer, IndexWriterConfig.OpenMode openMode)
      Override this to customize index settings, e.g.
      protected Query getLastTokenQuery​(java.lang.String token)
      This is called if the last token isn't ended (e.g.
      protected FieldType getTextFieldType()
      Subclass can override this method to change the field type of the text field e.g.
      protected java.lang.Object highlight​(java.lang.String text, java.util.Set<java.lang.String> matchedTokens, java.lang.String prefixToken)
      Override this method to customize the Object representing a single highlighted suggestions; the result is set on each Lookup.LookupResult.highlightKey member.
      boolean load​(DataInput out)
      Discard current lookup data and load it from a previously saved copy.
      java.util.List<Lookup.LookupResult> lookup​(java.lang.CharSequence key, int num, boolean allTermsRequired, boolean doHighlight)
      Lookup, without any context.
      java.util.List<Lookup.LookupResult> lookup​(java.lang.CharSequence key, java.util.Map<BytesRef,​BooleanClause.Occur> contextInfo, int num, boolean allTermsRequired, boolean doHighlight)
      Retrieve suggestions, specifying whether all terms must match (allTermsRequired) and whether the hits should be highlighted (doHighlight).
      java.util.List<Lookup.LookupResult> lookup​(java.lang.CharSequence key, java.util.Set<BytesRef> contexts, boolean onlyMorePopular, int num)
      Look up a key and return possible completion for this key.
      java.util.List<Lookup.LookupResult> lookup​(java.lang.CharSequence key, java.util.Set<BytesRef> contexts, int num, boolean allTermsRequired, boolean doHighlight)
      Lookup, with context but without booleans.
      java.util.List<Lookup.LookupResult> lookup​(java.lang.CharSequence key, BooleanQuery contextQuery, int num, boolean allTermsRequired, boolean doHighlight)
      This is an advanced method providing the capability to send down to the suggester any arbitrary lucene query to be used to filter the result of the suggester
      long ramBytesUsed()
      Return the memory usage of this object in bytes.
      void refresh()
      Reopens the underlying searcher; it's best to "batch up" many additions/updates, and then call refresh once in the end.
      boolean store​(DataOutput in)
      Persist the constructed lookup data to a directory.
      private BooleanQuery toQuery​(java.util.Map<BytesRef,​BooleanClause.Occur> contextInfo)  
      private BooleanQuery toQuery​(java.util.Set<BytesRef> contextInfo)  
      void update​(BytesRef text, java.util.Set<BytesRef> contexts, long weight, BytesRef payload)
      Updates a previous suggestion, matching the exact same text as before.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • TEXTGRAMS_FIELD_NAME

        protected static final java.lang.String TEXTGRAMS_FIELD_NAME
        edgegrams for searching short prefixes without Prefix Query that's controlled by minPrefixChars
        See Also:
        Constant Field Values
      • TEXT_FIELD_NAME

        protected static final java.lang.String TEXT_FIELD_NAME
        Field name used for the indexed text.
        See Also:
        Constant Field Values
      • EXACT_TEXT_FIELD_NAME

        protected static final java.lang.String EXACT_TEXT_FIELD_NAME
        Field name used for the indexed text, as a StringField, for exact lookup.
        See Also:
        Constant Field Values
      • CONTEXTS_FIELD_NAME

        protected static final java.lang.String CONTEXTS_FIELD_NAME
        Field name used for the indexed context, as a StringField and a SortedSetDVField, for filtering.
        See Also:
        Constant Field Values
      • queryAnalyzer

        protected final Analyzer queryAnalyzer
        Analyzer used at search time
      • indexAnalyzer

        protected final Analyzer indexAnalyzer
        Analyzer used at index time
      • minPrefixChars

        final int minPrefixChars
      • allTermsRequired

        private final boolean allTermsRequired
      • highlight

        private final boolean highlight
      • commitOnBuild

        private final boolean commitOnBuild
      • closeIndexWriterOnBuild

        private final boolean closeIndexWriterOnBuild
      • writer

        protected IndexWriter writer
        Used for ongoing NRT additions/updates.
      • searcherMgrLock

        protected final java.lang.Object searcherMgrLock
        Used to manage concurrent access to searcherMgr
      • DEFAULT_MIN_PREFIX_CHARS

        public static final int DEFAULT_MIN_PREFIX_CHARS
        Default minimum number of leading characters before PrefixQuery is used (4).
        See Also:
        Constant Field Values
      • DEFAULT_ALL_TERMS_REQUIRED

        public static final boolean DEFAULT_ALL_TERMS_REQUIRED
        Default boolean clause option for multiple terms matching (all terms required).
        See Also:
        Constant Field Values
      • DEFAULT_HIGHLIGHT

        public static final boolean DEFAULT_HIGHLIGHT
        Default higlighting option.
        See Also:
        Constant Field Values
      • DEFAULT_CLOSE_INDEXWRITER_ON_BUILD

        protected static final boolean DEFAULT_CLOSE_INDEXWRITER_ON_BUILD
        Default option to close the IndexWriter once the index has been built.
        See Also:
        Constant Field Values
      • SORT

        private static final Sort SORT
        How we sort the postings and search results.
    • Constructor Detail

      • AnalyzingInfixSuggester

        public AnalyzingInfixSuggester​(Directory dir,
                                       Analyzer analyzer)
                                throws java.io.IOException
        Create a new instance, loading from a previously built AnalyzingInfixSuggester directory, if it exists. This directory must be private to the infix suggester (i.e., not an external Lucene index). Note that close() will also close the provided directory.
        Throws:
        java.io.IOException
      • AnalyzingInfixSuggester

        public AnalyzingInfixSuggester​(Directory dir,
                                       Analyzer indexAnalyzer,
                                       Analyzer queryAnalyzer,
                                       int minPrefixChars,
                                       boolean commitOnBuild)
                                throws java.io.IOException
        Create a new instance, loading from a previously built AnalyzingInfixSuggester directory, if it exists. This directory must be private to the infix suggester (i.e., not an external Lucene index). Note that close() will also close the provided directory.
        Parameters:
        minPrefixChars - Minimum number of leading characters before PrefixQuery is used (default 4). Prefixes shorter than this are indexed as character ngrams (increasing index size but making lookups faster).
        commitOnBuild - Call commit after the index has finished building. This would persist the suggester index to disk and future instances of this suggester can use this pre-built dictionary.
        Throws:
        java.io.IOException
      • AnalyzingInfixSuggester

        public AnalyzingInfixSuggester​(Directory dir,
                                       Analyzer indexAnalyzer,
                                       Analyzer queryAnalyzer,
                                       int minPrefixChars,
                                       boolean commitOnBuild,
                                       boolean allTermsRequired,
                                       boolean highlight)
                                throws java.io.IOException
        Create a new instance, loading from a previously built AnalyzingInfixSuggester directory, if it exists. This directory must be private to the infix suggester (i.e., not an external Lucene index). Note that close() will also close the provided directory.
        Parameters:
        minPrefixChars - Minimum number of leading characters before PrefixQuery is used (default 4). Prefixes shorter than this are indexed as character ngrams (increasing index size but making lookups faster).
        commitOnBuild - Call commit after the index has finished building. This would persist the suggester index to disk and future instances of this suggester can use this pre-built dictionary.
        allTermsRequired - All terms in the suggest query must be matched.
        highlight - Highlight suggest query in suggestions.
        Throws:
        java.io.IOException
      • AnalyzingInfixSuggester

        public AnalyzingInfixSuggester​(Directory dir,
                                       Analyzer indexAnalyzer,
                                       Analyzer queryAnalyzer,
                                       int minPrefixChars,
                                       boolean commitOnBuild,
                                       boolean allTermsRequired,
                                       boolean highlight,
                                       boolean closeIndexWriterOnBuild)
                                throws java.io.IOException
        Create a new instance, loading from a previously built AnalyzingInfixSuggester directory, if it exists. This directory must be private to the infix suggester (i.e., not an external Lucene index). Note that close() will also close the provided directory.
        Parameters:
        minPrefixChars - Minimum number of leading characters before PrefixQuery is used (default 4). Prefixes shorter than this are indexed as character ngrams (increasing index size but making lookups faster).
        commitOnBuild - Call commit after the index has finished building. This would persist the suggester index to disk and future instances of this suggester can use this pre-built dictionary.
        allTermsRequired - All terms in the suggest query must be matched.
        highlight - Highlight suggest query in suggestions.
        closeIndexWriterOnBuild - If true, the IndexWriter will be closed after the index has finished building.
        Throws:
        java.io.IOException
    • Method Detail

      • getDirectory

        protected Directory getDirectory​(java.nio.file.Path path)
                                  throws java.io.IOException
        Subclass can override to choose a specific Directory implementation.
        Throws:
        java.io.IOException
      • build

        public void build​(InputIterator iter)
                   throws java.io.IOException
        Description copied from class: Lookup
        Builds up a new internal Lookup representation based on the given InputIterator. The implementation might re-sort the data internally.
        Specified by:
        build in class Lookup
        Throws:
        java.io.IOException
      • commit

        public void commit()
                    throws java.io.IOException
        Commits all pending changes made to this suggester to disk.
        Throws:
        java.io.IOException
        See Also:
        IndexWriter.commit()
      • getGramAnalyzer

        private Analyzer getGramAnalyzer()
      • ensureOpen

        private void ensureOpen()
                         throws java.io.IOException
        Throws:
        java.io.IOException
      • buildDocument

        private Document buildDocument​(BytesRef text,
                                       java.util.Set<BytesRef> contexts,
                                       long weight,
                                       BytesRef payload)
                                throws java.io.IOException
        Throws:
        java.io.IOException
      • refresh

        public void refresh()
                     throws java.io.IOException
        Reopens the underlying searcher; it's best to "batch up" many additions/updates, and then call refresh once in the end.
        Throws:
        java.io.IOException
      • getTextFieldType

        protected FieldType getTextFieldType()
        Subclass can override this method to change the field type of the text field e.g. to change the index options
      • lookup

        public java.util.List<Lookup.LookupResult> lookup​(java.lang.CharSequence key,
                                                          java.util.Set<BytesRef> contexts,
                                                          boolean onlyMorePopular,
                                                          int num)
                                                   throws java.io.IOException
        Description copied from class: Lookup
        Look up a key and return possible completion for this key.
        Specified by:
        lookup in class Lookup
        Parameters:
        key - lookup key. Depending on the implementation this may be a prefix, misspelling, or even infix.
        contexts - contexts to filter the lookup by, or null if all contexts are allowed; if the suggestion contains any of the contexts, it's a match
        onlyMorePopular - return only more popular results
        num - maximum number of results to return
        Returns:
        a list of possible completions, with their relative weight (e.g. popularity)
        Throws:
        java.io.IOException
      • lookup

        public java.util.List<Lookup.LookupResult> lookup​(java.lang.CharSequence key,
                                                          int num,
                                                          boolean allTermsRequired,
                                                          boolean doHighlight)
                                                   throws java.io.IOException
        Lookup, without any context.
        Throws:
        java.io.IOException
      • lookup

        public java.util.List<Lookup.LookupResult> lookup​(java.lang.CharSequence key,
                                                          java.util.Set<BytesRef> contexts,
                                                          int num,
                                                          boolean allTermsRequired,
                                                          boolean doHighlight)
                                                   throws java.io.IOException
        Lookup, with context but without booleans. Context booleans default to SHOULD, so each suggestion must have at least one of the contexts.
        Throws:
        java.io.IOException
      • getLastTokenQuery

        protected Query getLastTokenQuery​(java.lang.String token)
                                   throws java.io.IOException
        This is called if the last token isn't ended (e.g. user did not type a space after it). Return an appropriate Query clause to add to the BooleanQuery.
        Throws:
        java.io.IOException
      • lookup

        public java.util.List<Lookup.LookupResult> lookup​(java.lang.CharSequence key,
                                                          java.util.Map<BytesRef,​BooleanClause.Occur> contextInfo,
                                                          int num,
                                                          boolean allTermsRequired,
                                                          boolean doHighlight)
                                                   throws java.io.IOException
        Retrieve suggestions, specifying whether all terms must match (allTermsRequired) and whether the hits should be highlighted (doHighlight).
        Throws:
        java.io.IOException
      • addContextToQuery

        public void addContextToQuery​(BooleanQuery.Builder query,
                                      BytesRef context,
                                      BooleanClause.Occur clause)
        This method is handy as we do not need access to internal fields such as CONTEXTS_FIELD_NAME in order to build queries However, here may not be its best location.
        Parameters:
        query - an instance of @See BooleanQuery
        context - the context
        clause - one of BooleanClause.Occur
      • lookup

        public java.util.List<Lookup.LookupResult> lookup​(java.lang.CharSequence key,
                                                          BooleanQuery contextQuery,
                                                          int num,
                                                          boolean allTermsRequired,
                                                          boolean doHighlight)
                                                   throws java.io.IOException
        This is an advanced method providing the capability to send down to the suggester any arbitrary lucene query to be used to filter the result of the suggester
        Overrides:
        lookup in class Lookup
        Parameters:
        key - the keyword being looked for
        contextQuery - an arbitrary Lucene query to be used to filter the result of the suggester. addContextToQuery(org.apache.lucene.search.BooleanQuery.Builder, org.apache.lucene.util.BytesRef, org.apache.lucene.search.BooleanClause.Occur) could be used to build this contextQuery.
        num - number of items to return
        allTermsRequired - all searched terms must match or not
        doHighlight - if true, the matching term will be highlighted in the search result
        Returns:
        the result of the suggester
        Throws:
        java.io.IOException - f the is IO exception while reading data from the index
      • createResults

        protected java.util.List<Lookup.LookupResult> createResults​(IndexSearcher searcher,
                                                                    TopFieldDocs hits,
                                                                    int num,
                                                                    java.lang.CharSequence charSequence,
                                                                    boolean doHighlight,
                                                                    java.util.Set<java.lang.String> matchedTokens,
                                                                    java.lang.String prefixToken)
                                                             throws java.io.IOException
        Create the results based on the search hits. Can be overridden by subclass to add particular behavior (e.g. weight transformation). Note that there is no prefix token (the prefixToken argument will be null) whenever the final token in the incoming request was in fact finished (had trailing characters, such as white-space).
        Throws:
        java.io.IOException - If there are problems reading fields from the underlying Lucene index.
      • finishQuery

        protected Query finishQuery​(BooleanQuery.Builder in,
                                    boolean allTermsRequired)
        Subclass can override this to tweak the Query before searching.
      • highlight

        protected java.lang.Object highlight​(java.lang.String text,
                                             java.util.Set<java.lang.String> matchedTokens,
                                             java.lang.String prefixToken)
                                      throws java.io.IOException
        Override this method to customize the Object representing a single highlighted suggestions; the result is set on each Lookup.LookupResult.highlightKey member.
        Throws:
        java.io.IOException
      • addNonMatch

        protected void addNonMatch​(java.lang.StringBuilder sb,
                                   java.lang.String text)
        Called while highlighting a single result, to append a non-matching chunk of text from the suggestion to the provided fragments list.
        Parameters:
        sb - The StringBuilder to append to
        text - The text chunk to add
      • addWholeMatch

        protected void addWholeMatch​(java.lang.StringBuilder sb,
                                     java.lang.String surface,
                                     java.lang.String analyzed)
        Called while highlighting a single result, to append the whole matched token to the provided fragments list.
        Parameters:
        sb - The StringBuilder to append to
        surface - The surface form (original) text
        analyzed - The analyzed token corresponding to the surface form text
      • addPrefixMatch

        protected void addPrefixMatch​(java.lang.StringBuilder sb,
                                      java.lang.String surface,
                                      java.lang.String analyzed,
                                      java.lang.String prefixToken)
        Called while highlighting a single result, to append a matched prefix token, to the provided fragments list.
        Parameters:
        sb - The StringBuilder to append to
        surface - The fragment of the surface form (indexed during build(org.apache.lucene.search.suggest.InputIterator), corresponding to this match
        analyzed - The analyzed token that matched
        prefixToken - The prefix of the token that matched
      • store

        public boolean store​(DataOutput in)
                      throws java.io.IOException
        Description copied from class: Lookup
        Persist the constructed lookup data to a directory. Optional operation.
        Specified by:
        store in class Lookup
        Parameters:
        in - DataOutput to write the data to.
        Returns:
        true if successful, false if unsuccessful or not supported.
        Throws:
        java.io.IOException - when fatal IO error occurs.
      • load

        public boolean load​(DataInput out)
                     throws java.io.IOException
        Description copied from class: Lookup
        Discard current lookup data and load it from a previously saved copy. Optional operation.
        Specified by:
        load in class Lookup
        Parameters:
        out - the DataInput to load the lookup data.
        Returns:
        true if completed successfully, false if unsuccessful or not supported.
        Throws:
        java.io.IOException - when fatal IO error occurs.
      • close

        public void close()
                   throws java.io.IOException
        Specified by:
        close in interface java.lang.AutoCloseable
        Specified by:
        close in interface java.io.Closeable
        Throws:
        java.io.IOException
      • ramBytesUsed

        public long ramBytesUsed()
        Description copied from interface: Accountable
        Return the memory usage of this object in bytes. Negative values are illegal.
        Specified by:
        ramBytesUsed in interface Accountable
      • getChildResources

        public java.util.Collection<Accountable> getChildResources()
        Description copied from interface: Accountable
        Returns nested resources of this class. The result should be a point-in-time snapshot (to avoid race conditions).
        Specified by:
        getChildResources in interface Accountable
        See Also:
        Accountables
      • getCount

        public long getCount()
                      throws java.io.IOException
        Description copied from class: Lookup
        Get the number of entries the lookup was built with
        Specified by:
        getCount in class Lookup
        Returns:
        total number of suggester entries
        Throws:
        java.io.IOException