Class CompressingStoredFieldsFormat


  • public class CompressingStoredFieldsFormat
    extends StoredFieldsFormat
    A StoredFieldsFormat that compresses documents in chunks in order to improve the compression ratio.

    For a chunk size of chunkSize bytes, this StoredFieldsFormat does not support documents larger than (231 - chunkSize) bytes.

    For optimal performance, you should use a MergePolicy that returns segments that have the biggest byte size first.

    • Field Detail

      • formatName

        private final java.lang.String formatName
      • segmentSuffix

        private final java.lang.String segmentSuffix
      • chunkSize

        private final int chunkSize
      • maxDocsPerChunk

        private final int maxDocsPerChunk
      • blockSize

        private final int blockSize
    • Constructor Detail

      • CompressingStoredFieldsFormat

        public CompressingStoredFieldsFormat​(java.lang.String formatName,
                                             java.lang.String segmentSuffix,
                                             CompressionMode compressionMode,
                                             int chunkSize,
                                             int maxDocsPerChunk,
                                             int blockSize)
        Create a new CompressingStoredFieldsFormat.

        formatName is the name of the format. This name will be used in the file formats to perform codec header checks.

        segmentSuffix is the segment suffix. This suffix is added to the result file name only if it's not the empty string.

        The compressionMode parameter allows you to choose between compression algorithms that have various compression and decompression speeds so that you can pick the one that best fits your indexing and searching throughput. You should never instantiate two CompressingStoredFieldsFormats that have the same name but different CompressionModes.

        chunkSize is the minimum byte size of a chunk of documents. A value of 1 can make sense if there is redundancy across fields. maxDocsPerChunk is an upperbound on how many docs may be stored in a single chunk. This is to bound the cpu costs for highly compressible data.

        Higher values of chunkSize should improve the compression ratio but will require more memory at indexing time and might make document loading a little slower (depending on the size of your OS cache compared to the size of your index).

        Parameters:
        formatName - the name of the StoredFieldsFormat
        compressionMode - the CompressionMode to use
        chunkSize - the minimum number of bytes of a single chunk of stored documents
        maxDocsPerChunk - the maximum number of documents in a single chunk
        blockSize - the number of chunks to store in an index block
        See Also:
        CompressionMode