Class CompressingStoredFieldsIndexWriter

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable

    public final class CompressingStoredFieldsIndexWriter
    extends java.lang.Object
    implements java.io.Closeable
    Efficient index format for block-based Codecs.

    This writer generates a file which can be loaded into memory using memory-efficient data structures to quickly locate the block that contains any document.

    In order to have a compact in-memory representation, for every block of 1024 chunks, this index computes the average number of bytes per chunk and for every chunk, only stores the difference between

    • ${chunk number} * ${average length of a chunk}
    • and the actual start offset of the chunk

    Data is written as follows:

    • PackedIntsVersion, <Block>BlockCount, BlocksEndMarker
    • PackedIntsVersion --> PackedInts.VERSION_CURRENT as a VInt
    • BlocksEndMarker --> 0 as a VInt, this marks the end of blocks since blocks are not allowed to start with 0
    • Block --> BlockChunks, <DocBases>, <StartPointers>
    • BlockChunks --> a VInt which is the number of chunks encoded in the block
    • DocBases --> DocBase, AvgChunkDocs, BitsPerDocBaseDelta, DocBaseDeltas
    • DocBase --> first document ID of the block of chunks, as a VInt
    • AvgChunkDocs --> average number of documents in a single chunk, as a VInt
    • BitsPerDocBaseDelta --> number of bits required to represent a delta from the average using ZigZag encoding
    • DocBaseDeltas --> packed array of BlockChunks elements of BitsPerDocBaseDelta bits each, representing the deltas from the average doc base using ZigZag encoding.
    • StartPointers --> StartPointerBase, AvgChunkSize, BitsPerStartPointerDelta, StartPointerDeltas
    • StartPointerBase --> the first start pointer of the block, as a VLong
    • AvgChunkSize --> the average size of a chunk of compressed documents, as a VLong
    • BitsPerStartPointerDelta --> number of bits required to represent a delta from the average using ZigZag encoding
    • StartPointerDeltas --> packed array of BlockChunks elements of BitsPerStartPointerDelta bits each, representing the deltas from the average start pointer using ZigZag encoding
    • Footer --> CodecFooter

    Notes

    • For any block, the doc base of the n-th chunk can be restored with DocBase + AvgChunkDocs * n + DocBaseDeltas[n].
    • For any block, the start pointer of the n-th chunk can be restored with StartPointerBase + AvgChunkSize * n + StartPointerDeltas[n].
    • Once data is loaded into memory, you can lookup the start pointer of any document chunk by performing two binary searches: a first one based on the values of DocBase in order to find the right block, and then inside the block based on DocBaseDeltas (by reconstructing the doc bases for every chunk).
    • Field Detail

      • blockSize

        final int blockSize
      • totalDocs

        int totalDocs
      • blockDocs

        int blockDocs
      • blockChunks

        int blockChunks
      • firstStartPointer

        long firstStartPointer
      • maxStartPointer

        long maxStartPointer
      • docBaseDeltas

        final int[] docBaseDeltas
      • startPointerDeltas

        final long[] startPointerDeltas
    • Constructor Detail

      • CompressingStoredFieldsIndexWriter

        CompressingStoredFieldsIndexWriter​(IndexOutput indexOutput,
                                           int blockSize)
                                    throws java.io.IOException
        Throws:
        java.io.IOException
    • Method Detail

      • reset

        private void reset()
      • writeBlock

        private void writeBlock()
                         throws java.io.IOException
        Throws:
        java.io.IOException
      • writeIndex

        void writeIndex​(int numDocs,
                        long startPointer)
                 throws java.io.IOException
        Throws:
        java.io.IOException
      • finish

        void finish​(int numDocs,
                    long maxPointer)
             throws java.io.IOException
        Throws:
        java.io.IOException
      • close

        public void close()
                   throws java.io.IOException
        Specified by:
        close in interface java.lang.AutoCloseable
        Specified by:
        close in interface java.io.Closeable
        Throws:
        java.io.IOException