Class TFIDFSimilarity.TFIDFScorer

  • Enclosing class:
    TFIDFSimilarity

    class TFIDFSimilarity.TFIDFScorer
    extends Similarity.SimScorer
    Collection statistics for the TF-IDF model. The only statistic of interest to this model is idf.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private float boost  
      private Explanation idf
      The idf and its explanation
      (package private) float[] normTable  
      private float queryWeight  
    • Constructor Summary

      Constructors 
      Constructor Description
      TFIDFScorer​(float boost, Explanation idf, float[] normTable)  
    • Field Detail

      • idf

        private final Explanation idf
        The idf and its explanation
      • boost

        private final float boost
      • queryWeight

        private final float queryWeight
      • normTable

        final float[] normTable
    • Constructor Detail

      • TFIDFScorer

        public TFIDFScorer​(float boost,
                           Explanation idf,
                           float[] normTable)
    • Method Detail

      • score

        public float score​(float freq,
                           long norm)
        Description copied from class: Similarity.SimScorer
        Score a single document. freq is the document-term sloppy frequency and must be finite and positive. norm is the encoded normalization factor as computed by Similarity.computeNorm(FieldInvertState) at index time, or 1 if norms are disabled. norm is never 0.

        Score must not decrease when freq increases, ie. if freq1 > freq2, then score(freq1, norm) >= score(freq2, norm) for any value of norm that may be produced by Similarity.computeNorm(FieldInvertState).

        Score must not increase when the unsigned norm increases, ie. if Long.compareUnsigned(norm1, norm2) > 0 then score(freq, norm1) <= score(freq, norm2) for any legal freq.

        As a consequence, the maximum score that this scorer can produce is bound by score(Float.MAX_VALUE, 1).

        Specified by:
        score in class Similarity.SimScorer
        Parameters:
        freq - sloppy term frequency, must be finite and positive
        norm - encoded normalization factor or 1 if norms are disabled
        Returns:
        document's score
      • explainScore

        private Explanation explainScore​(Explanation freq,
                                         long encodedNorm,
                                         float[] normTable)