Package org.apache.lucene.search
Class DiversifiedTopDocsCollector
- java.lang.Object
-
- org.apache.lucene.search.TopDocsCollector<DiversifiedTopDocsCollector.ScoreDocKey>
-
- org.apache.lucene.search.DiversifiedTopDocsCollector
-
- All Implemented Interfaces:
Collector
public abstract class DiversifiedTopDocsCollector extends TopDocsCollector<DiversifiedTopDocsCollector.ScoreDocKey>
ATopDocsCollector
that controls diversity in results by ensuring no more than maxHitsPerKey results from a common source are collected in the final results. An example application might be a product search in a marketplace where no more than 3 results per retailer are permitted in search results.To compare behaviour with other forms of collector, a useful analogy might be the problem of making a compilation album of 1967's top hit records:
- A vanilla query's results might look like a "Best of the Beatles" album - high quality but not much diversity
- A GroupingSearch would produce the equivalent of "The 10 top-selling artists of 1967 - some killer and quite a lot of filler"
- A "diversified" query would be the top 20 hit records of that year - with a max of 3 Beatles hits in order to maintain diversity
- Working in one pass over the data
- Not requiring the client to guess how many groups are required
- Removing low-scoring "filler" which sits at the end of each group's hits
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
DiversifiedTopDocsCollector.ScoreDocKey
An extension to ScoreDoc that includes a key used for grouping purposes(package private) static class
DiversifiedTopDocsCollector.ScoreDocKeyQueue
-
Field Summary
Fields Modifier and Type Field Description private DiversifiedTopDocsCollector.ScoreDocKeyQueue
globalQueue
protected int
maxNumPerKey
private int
numHits
private java.util.Map<java.lang.Long,DiversifiedTopDocsCollector.ScoreDocKeyQueue>
perKeyQueues
(package private) DiversifiedTopDocsCollector.ScoreDocKey
spare
private java.util.Stack<DiversifiedTopDocsCollector.ScoreDocKeyQueue>
sparePerKeyQueues
-
Fields inherited from class org.apache.lucene.search.TopDocsCollector
EMPTY_TOPDOCS, pq, totalHits, totalHitsRelation
-
-
Constructor Summary
Constructors Constructor Description DiversifiedTopDocsCollector(int numHits, int maxHitsPerKey)
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected abstract NumericDocValues
getKeys(LeafReaderContext context)
Get a source of values used for grouping keysLeafCollector
getLeafCollector(LeafReaderContext context)
Create a newcollector
to collect the given context.protected DiversifiedTopDocsCollector.ScoreDocKey
insert(DiversifiedTopDocsCollector.ScoreDocKey addition, int docBase, NumericDocValues keys)
protected TopDocs
newTopDocs(ScoreDoc[] results, int start)
Returns aTopDocs
instance containing the given results.private void
perKeyGroupRemove(DiversifiedTopDocsCollector.ScoreDocKey globalOverflow)
ScoreMode
scoreMode()
Indicates what features are required from the scorer.-
Methods inherited from class org.apache.lucene.search.TopDocsCollector
getTotalHits, populateResults, topDocs, topDocs, topDocs, topDocsSize
-
-
-
-
Field Detail
-
spare
DiversifiedTopDocsCollector.ScoreDocKey spare
-
globalQueue
private DiversifiedTopDocsCollector.ScoreDocKeyQueue globalQueue
-
numHits
private int numHits
-
perKeyQueues
private java.util.Map<java.lang.Long,DiversifiedTopDocsCollector.ScoreDocKeyQueue> perKeyQueues
-
maxNumPerKey
protected int maxNumPerKey
-
sparePerKeyQueues
private java.util.Stack<DiversifiedTopDocsCollector.ScoreDocKeyQueue> sparePerKeyQueues
-
-
Method Detail
-
getKeys
protected abstract NumericDocValues getKeys(LeafReaderContext context)
Get a source of values used for grouping keys
-
scoreMode
public ScoreMode scoreMode()
Description copied from interface:Collector
Indicates what features are required from the scorer.
-
newTopDocs
protected TopDocs newTopDocs(ScoreDoc[] results, int start)
Description copied from class:TopDocsCollector
Returns aTopDocs
instance containing the given results. Ifresults
is null it means there are no results to return, either because there were 0 calls to collect() or because the arguments to topDocs were invalid.- Overrides:
newTopDocs
in classTopDocsCollector<DiversifiedTopDocsCollector.ScoreDocKey>
-
insert
protected DiversifiedTopDocsCollector.ScoreDocKey insert(DiversifiedTopDocsCollector.ScoreDocKey addition, int docBase, NumericDocValues keys) throws java.io.IOException
- Throws:
java.io.IOException
-
perKeyGroupRemove
private void perKeyGroupRemove(DiversifiedTopDocsCollector.ScoreDocKey globalOverflow)
-
getLeafCollector
public LeafCollector getLeafCollector(LeafReaderContext context) throws java.io.IOException
Description copied from interface:Collector
Create a newcollector
to collect the given context.- Parameters:
context
- next atomic reader context- Throws:
java.io.IOException
-
-