This sub-package provides the capability to compress and decompress data using the block specification.
Because the LZ4 block format doesn’t define a container format, the Python bindings will by default insert the original data size as an integer at the start of the compressed payload. However, it is possible to disable this functionality, and you may wish to do so for compatibility with other language bindings, such as the Java bindings.
To use the lz4 block format bindings is straightforward:
>>> import lz4.block
>>> import os
>>> input_data = 20 * 128 * os.urandom(1024) # Read 20 * 128kb
>>> compressed_data = lz4.block.compress(input_data)
>>> output_data = lz4.block.decompress(compressed_data)
>>> input_data == output_data
True
In this simple example, the size of the uncompressed data is stored in the compressed data, and this size is then utilized when uncompressing the data in order to correctly size the buffer. Instead, you may want to not store the size of the uncompressed data to ensure compatibility with the Java bindings. The example below demonstrates how to use the block format without storing the size of the uncompressed data.
>>> import lz4.block
>>> data = b'0' * 255
>>> compressed = lz4.block.compress(data, store_size=False)
>>> decompressed = lz4.block.decompress(compressed, uncompressed_size=255)
>>> decompressed == data
True
The uncompressed_size
argument specifies an upper bound on the size
of the uncompressed data size rather than an absolute value, such that
the following example also works.
>>> import lz4.block
>>> data = b'0' * 255
>>> compressed = lz4.block.compress(data, store_size=False)
>>> decompressed = lz4.block.decompress(compressed, uncompressed_size=2048)
>>> decompressed == data
True
A common situation is not knowing the size of the uncompressed data at decompression time. The following example illustrates a strategy that can be used in this case.
>>> import lz4.block
>>> data = b'0' * 2048
>>> compressed = lz4.block.compress(data, store_size=False)
>>> usize = 255
>>> max_size = 4096
>>> while True:
... try:
... decompressed = lz4.block.decompress(compressed, uncompressed_size=usize)
... break
... except lz4.block.LZ4BlockError:
... usize *= 2
... if usize > max_size:
... print('Error: data too large or corrupt')
... break
>>> decompressed == data
True
In this example we are catching the lz4.block.LZ4BlockError
exception. This exception is raisedd if the LZ4 library call fails,
which can be caused by either the buffer used to store the
uncompressed data (as set by usize
) being too small, or the input
compressed data being invalid - it is not possible to distinguish the
two cases, and this is why we set an absolute upper bound (max_size
)
on the memory that can be allocated for the uncompressed data. If we
did not take this precaution, the code, if ppassed invalid compressed
data would continuously try to allocate a larger and larger buffer for
decompression until the system ran out of memory.
lz4.block.
compress
(source, mode='default', acceleration=1, compression=0, return_bytearray=False)¶Compress source, returning the compressed data as a string. Raises an exception if any error occurs.
Parameters: | source (str, bytes or buffer-compatible object) – Data to compress |
---|---|
Keyword Arguments: | |
|
|
Returns: | Compressed data. |
Return type: | bytes or bytearray |
lz4.block.
decompress
(source, uncompressed_size=-1, return_bytearray=False)¶Decompress source, returning the uncompressed data as a string. Raises an exception if any error occurs.
Parameters: | source (str, bytes or buffer-compatible object) – Data to decompress. |
---|---|
Keyword Arguments: | |
|
|
Returns: | Decompressed data. |
Return type: | bytes or bytearray |
Raises: |
|