Class BKDWriter

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable

    public class BKDWriter
    extends java.lang.Object
    implements java.io.Closeable
    Recursively builds a block KD-tree to assign all incoming points in N-dim space to smaller and smaller N-dim rectangles (cells) until the number of points in a given rectangle is <= maxPointsInLeafNode. The tree is fully balanced, which means the leaf nodes will have between 50% and 100% of the requested maxPointsInLeafNode. Values that fall exactly on a cell boundary may be in either cell.

    The number of dimensions can be 1 to 8, but every byte[] value is fixed length.

    This consumes heap during writing: it allocates a Long[numLeaves], a byte[numLeaves*(1+bytesPerDim)] and then uses up to the specified maxMBSortInHeap heap space for writing.

    NOTE: This can write at most Integer.MAX_VALUE * maxPointsInLeafNode / (1+bytesPerDim) total points.

    • Field Detail

      • VERSION_LEAF_STORES_BOUNDS

        public static final int VERSION_LEAF_STORES_BOUNDS
        See Also:
        Constant Field Values
      • VERSION_SELECTIVE_INDEXING

        public static final int VERSION_SELECTIVE_INDEXING
        See Also:
        Constant Field Values
      • VERSION_LOW_CARDINALITY_LEAVES

        public static final int VERSION_LOW_CARDINALITY_LEAVES
        See Also:
        Constant Field Values
      • bytesPerDoc

        private final int bytesPerDoc
        How many bytes each docs takes in the fixed-width offline format
      • DEFAULT_MAX_POINTS_IN_LEAF_NODE

        public static final int DEFAULT_MAX_POINTS_IN_LEAF_NODE
        Default maximum number of point in each leaf block
        See Also:
        Constant Field Values
      • DEFAULT_MAX_MB_SORT_IN_HEAP

        public static final float DEFAULT_MAX_MB_SORT_IN_HEAP
        Default maximum heap to use, before spilling to (slower) disk
        See Also:
        Constant Field Values
      • MAX_DIMS

        public static final int MAX_DIMS
        Maximum number of dimensions
        See Also:
        Constant Field Values
      • SPLITS_BEFORE_EXACT_BOUNDS

        private static final int SPLITS_BEFORE_EXACT_BOUNDS
        Number of splits before we compute the exact bounding box of an inner node.
        See Also:
        Constant Field Values
      • numDataDims

        protected final int numDataDims
        How many dimensions we are storing at the leaf (data) nodes
      • numIndexDims

        protected final int numIndexDims
        How many dimensions we are indexing in the internal nodes
      • bytesPerDim

        protected final int bytesPerDim
        How many bytes each value in each dimension takes.
      • packedBytesLength

        protected final int packedBytesLength
        numDataDims * bytesPerDim
      • packedIndexBytesLength

        protected final int packedIndexBytesLength
        numIndexDims * bytesPerDim
      • tempFileNamePrefix

        final java.lang.String tempFileNamePrefix
      • maxMBSortInHeap

        final double maxMBSortInHeap
      • scratchDiff

        final byte[] scratchDiff
      • scratch1

        final byte[] scratch1
      • scratch2

        final byte[] scratch2
      • scratchBytesRef1

        final BytesRef scratchBytesRef1
      • scratchBytesRef2

        final BytesRef scratchBytesRef2
      • commonPrefixLengths

        final int[] commonPrefixLengths
      • finished

        private boolean finished
      • maxPointsInLeafNode

        protected final int maxPointsInLeafNode
      • maxPointsSortInHeap

        private final int maxPointsSortInHeap
      • minPackedValue

        protected final byte[] minPackedValue
        Minimum per-dim values, packed
      • maxPackedValue

        protected final byte[] maxPackedValue
        Maximum per-dim values, packed
      • pointCount

        protected long pointCount
      • totalPointCount

        private final long totalPointCount
        An upper bound on how many points the caller will add (includes deletions)
      • maxDoc

        private final int maxDoc
    • Constructor Detail

      • BKDWriter

        public BKDWriter​(int maxDoc,
                         Directory tempDir,
                         java.lang.String tempFileNamePrefix,
                         int numDataDims,
                         int numIndexDims,
                         int bytesPerDim,
                         int maxPointsInLeafNode,
                         double maxMBSortInHeap,
                         long totalPointCount)
                  throws java.io.IOException
        Throws:
        java.io.IOException
    • Method Detail

      • verifyParams

        public static void verifyParams​(int numDataDims,
                                        int numIndexDims,
                                        int maxPointsInLeafNode,
                                        double maxMBSortInHeap,
                                        long totalPointCount)
      • initPointWriter

        private void initPointWriter()
                              throws java.io.IOException
        Throws:
        java.io.IOException
      • add

        public void add​(byte[] packedValue,
                        int docID)
                 throws java.io.IOException
        Throws:
        java.io.IOException
      • getPointCount

        public long getPointCount()
        How many points have been added so far
      • writeField

        public long writeField​(IndexOutput out,
                               java.lang.String fieldName,
                               MutablePointValues reader)
                        throws java.io.IOException
        Write a field from a MutablePointValues. This way of writing points is faster than regular writes with add(byte[], int) since there is opportunity for reordering points before writing them to disk. This method does not use transient disk in order to reorder points.
        Throws:
        java.io.IOException
      • computePackedValueBounds

        private void computePackedValueBounds​(MutablePointValues values,
                                              int from,
                                              int to,
                                              byte[] minPackedValue,
                                              byte[] maxPackedValue,
                                              BytesRef scratch)
      • writeFieldNDims

        private long writeFieldNDims​(IndexOutput out,
                                     java.lang.String fieldName,
                                     MutablePointValues values)
                              throws java.io.IOException
        Throws:
        java.io.IOException
      • writeField1Dim

        private long writeField1Dim​(IndexOutput out,
                                    java.lang.String fieldName,
                                    MutablePointValues reader)
                             throws java.io.IOException
        Throws:
        java.io.IOException
      • merge

        public long merge​(IndexOutput out,
                          java.util.List<MergeState.DocMap> docMaps,
                          java.util.List<BKDReader> readers)
                   throws java.io.IOException
        More efficient bulk-add for incoming BKDReaders. This does a merge sort of the already sorted values and currently only works when numDims==1. This returns -1 if all documents containing dimensional values were deleted.
        Throws:
        java.io.IOException
      • rotateToTree

        private void rotateToTree​(int nodeID,
                                  int offset,
                                  int count,
                                  byte[] index,
                                  java.util.List<byte[]> leafBlockStartValues)
      • checkMaxLeafNodeCount

        private void checkMaxLeafNodeCount​(int numLeaves)
      • finish

        public long finish​(IndexOutput out)
                    throws java.io.IOException
        Writes the BKD tree to the provided IndexOutput and returns the file offset where index was written.
        Throws:
        java.io.IOException
      • packIndex

        private byte[] packIndex​(long[] leafBlockFPs,
                                 byte[] splitPackedValues)
                          throws java.io.IOException
        Packs the two arrays, representing a balanced binary tree, into a compact byte[] structure.
        Throws:
        java.io.IOException
      • appendBlock

        private int appendBlock​(RAMOutputStream writeBuffer,
                                java.util.List<byte[]> blocks)
                         throws java.io.IOException
        Appends the current contents of writeBuffer as another block on the growing in-memory file
        Throws:
        java.io.IOException
      • recursePackIndex

        private int recursePackIndex​(RAMOutputStream writeBuffer,
                                     long[] leafBlockFPs,
                                     byte[] splitPackedValues,
                                     long minBlockFP,
                                     java.util.List<byte[]> blocks,
                                     int nodeID,
                                     byte[] lastSplitValues,
                                     boolean[] negativeDeltas,
                                     boolean isLeft)
                              throws java.io.IOException
        lastSplitValues is per-dimension split value previously seen; we use this to prefix-code the split byte[] on each inner node
        Throws:
        java.io.IOException
      • getLeftMostLeafBlockFP

        private long getLeftMostLeafBlockFP​(long[] leafBlockFPs,
                                            int nodeID)
      • writeIndex

        private void writeIndex​(IndexOutput out,
                                int countPerLeaf,
                                long[] leafBlockFPs,
                                byte[] splitPackedValues)
                         throws java.io.IOException
        Throws:
        java.io.IOException
      • writeIndex

        private void writeIndex​(IndexOutput out,
                                int countPerLeaf,
                                int numLeaves,
                                byte[] packedIndex)
                         throws java.io.IOException
        Throws:
        java.io.IOException
      • writeLeafBlockDocs

        private void writeLeafBlockDocs​(DataOutput out,
                                        int[] docIDs,
                                        int start,
                                        int count)
                                 throws java.io.IOException
        Throws:
        java.io.IOException
      • writeLeafBlockPackedValues

        private void writeLeafBlockPackedValues​(DataOutput out,
                                                int[] commonPrefixLengths,
                                                int count,
                                                int sortedDim,
                                                java.util.function.IntFunction<BytesRef> packedValues,
                                                int leafCardinality)
                                         throws java.io.IOException
        Throws:
        java.io.IOException
      • writeLowCardinalityLeafBlockPackedValues

        private void writeLowCardinalityLeafBlockPackedValues​(DataOutput out,
                                                              int[] commonPrefixLengths,
                                                              int count,
                                                              java.util.function.IntFunction<BytesRef> packedValues)
                                                       throws java.io.IOException
        Throws:
        java.io.IOException
      • writeHighCardinalityLeafBlockPackedValues

        private void writeHighCardinalityLeafBlockPackedValues​(DataOutput out,
                                                               int[] commonPrefixLengths,
                                                               int count,
                                                               int sortedDim,
                                                               java.util.function.IntFunction<BytesRef> packedValues,
                                                               int compressedByteOffset)
                                                        throws java.io.IOException
        Throws:
        java.io.IOException
      • writeActualBounds

        private void writeActualBounds​(DataOutput out,
                                       int[] commonPrefixLengths,
                                       int count,
                                       java.util.function.IntFunction<BytesRef> packedValues)
                                throws java.io.IOException
        Throws:
        java.io.IOException
      • computeMinMax

        private static BytesRef[] computeMinMax​(int count,
                                                java.util.function.IntFunction<BytesRef> packedValues,
                                                int offset,
                                                int length)
        Return an array that contains the min and max values for the [offset, offset+length] interval of the given BytesRefs.
      • writeLeafBlockPackedValuesRange

        private void writeLeafBlockPackedValuesRange​(DataOutput out,
                                                     int[] commonPrefixLengths,
                                                     int start,
                                                     int end,
                                                     java.util.function.IntFunction<BytesRef> packedValues)
                                              throws java.io.IOException
        Throws:
        java.io.IOException
      • runLen

        private static int runLen​(java.util.function.IntFunction<BytesRef> packedValues,
                                  int start,
                                  int end,
                                  int byteOffset)
      • writeCommonPrefixes

        private void writeCommonPrefixes​(DataOutput out,
                                         int[] commonPrefixes,
                                         byte[] packedValue)
                                  throws java.io.IOException
        Throws:
        java.io.IOException
      • close

        public void close()
                   throws java.io.IOException
        Specified by:
        close in interface java.lang.AutoCloseable
        Specified by:
        close in interface java.io.Closeable
        Throws:
        java.io.IOException
      • verifyChecksum

        private java.lang.Error verifyChecksum​(java.lang.Throwable priorException,
                                               PointWriter writer)
                                        throws java.io.IOException
        Called on exception, to check whether the checksum is also corrupt in this source, and add that information (checksum matched or didn't) as a suppressed exception.
        Throws:
        java.io.IOException
      • valueInBounds

        private boolean valueInBounds​(BytesRef packedValue,
                                      byte[] minPackedValue,
                                      byte[] maxPackedValue)
        Called only in assert
      • split

        protected int split​(byte[] minPackedValue,
                            byte[] maxPackedValue,
                            int[] parentSplits)
        Pick the next dimension to split.
        Parameters:
        minPackedValue - the min values for all dimensions
        maxPackedValue - the max values for all dimensions
        parentSplits - how many times each dim has been split on the parent levels
        Returns:
        the dimension to split
      • switchToHeap

        private HeapPointWriter switchToHeap​(PointWriter source)
                                      throws java.io.IOException
        Pull a partition back into heap once the point count is low enough while recursing.
        Throws:
        java.io.IOException
      • build

        private void build​(int nodeID,
                           int leafNodeOffset,
                           MutablePointValues reader,
                           int from,
                           int to,
                           IndexOutput out,
                           byte[] minPackedValue,
                           byte[] maxPackedValue,
                           int[] parentSplits,
                           byte[] splitPackedValues,
                           long[] leafBlockFPs,
                           int[] spareDocIds)
                    throws java.io.IOException
        Throws:
        java.io.IOException
      • computePackedValueBounds

        private void computePackedValueBounds​(BKDRadixSelector.PathSlice slice,
                                              byte[] minPackedValue,
                                              byte[] maxPackedValue)
                                       throws java.io.IOException
        Throws:
        java.io.IOException
      • build

        private void build​(int nodeID,
                           int leafNodeOffset,
                           BKDRadixSelector.PathSlice points,
                           IndexOutput out,
                           BKDRadixSelector radixSelector,
                           byte[] minPackedValue,
                           byte[] maxPackedValue,
                           int[] parentSplits,
                           byte[] splitPackedValues,
                           long[] leafBlockFPs,
                           int[] spareDocIds)
                    throws java.io.IOException
        The point writer contains the data that is going to be splitted using radix selection. /* This method is used when we are merging previously written segments, in the numDims > 1 case.
        Throws:
        java.io.IOException
      • computeCommonPrefixLength

        private void computeCommonPrefixLength​(HeapPointWriter heapPointWriter,
                                               byte[] commonPrefix,
                                               int from,
                                               int to)
      • valuesInOrderAndBounds

        private boolean valuesInOrderAndBounds​(int count,
                                               int sortedDim,
                                               byte[] minPackedValue,
                                               byte[] maxPackedValue,
                                               java.util.function.IntFunction<BytesRef> values,
                                               int[] docs,
                                               int docsOffset)
                                        throws java.io.IOException
        Throws:
        java.io.IOException
      • valueInOrder

        private boolean valueInOrder​(long ord,
                                     int sortedDim,
                                     byte[] lastPackedValue,
                                     byte[] packedValue,
                                     int packedValueOffset,
                                     int doc,
                                     int lastDoc)