Skip to content

Using libzstd in a memory constrained environment

Yann Collet edited this page Mar 19, 2024 · 1 revision

Zstandard, in typical configurations, assumes that using several MB for compression and decompression is acceptable. This page discusses how to tune Zstandard in memory constrained environments. Typically, this is in embedded code, mobile code, or cases where than can be many Zstandard compression & decompression contexts, like when using streaming compression on a server to communicate with many clients concurrently.

Querying memory usage

To determine Zstandard's total memory usage for any Zstandard context object, including ZSTD_CCtx, ZSTD_CStream, ZSTD_DCtx, and ZSTD_DStream, you can use ZSTD_sizeof_Object() (e.g. ZSTD_sizeof_CCtx(ZSTD_CCtx* cctx)). You must call this function after you use the context object, because it reports the current memory usage.

void doCompress(ZSTD_CCtx* cctx);

void compressAndMeasureMemory(ZSTD_CCtx* cctx)
{
  doCompress(cctx);
  fprintf(stderr, "Memory usage of ZSTD_CCtx is %zu\n", ZSTD_sizeof_CCtx(cctx));
}

Decompression

Single-shot

Zstandard single-shot decompression, via ZSTD_decompress() or ZSTD_decompressDCtx(), use a fixed amount of memory independent of the frame being decompressed, or any parameters. The memory usage can be queried via ZSTD_sizeof_DCtx().

Streaming

Zstandard streaming decompression needs to allocate a buffer to store the history window. This buffer is Window_Size + 2 * Block_Maximum_Size. It also needs to allocate a buffer of Block_Maximum_Size to buffer one compressed block.

The maximum allowed Window_Size and Block_Maximum_Size can be controlled by ZSTD_d_maxWindowLog and ZSTD_d_maxBlockSize respectively. Setting these parameters limits the frames that will be accepted by the decompressor, so the compressor must be configured to respect these limitations.

ZSTD_d_maxWindowLog

This parameter controls the maximum allowed window size that the decompressor will accept. The decoder defaults to 27, which means it will allocate up to 128 MB. The standard recommends setting this value to at least 23 for maximum compatibility.

The compressor won't generate frames with window sizes > 128MB, unless explicitly told to by setting ZSTD_c_windowLog. If you set ZSTD_d_maxWindowLog you must ensure that the compressor sets ZSTD_c_windowLog to a value no greater than the selected maxWindowLog, otherwise the decompressor may reject the compressed frame.

The Block_Maximum_Size = min(128 KB, Window_Size), so by setting the Window_Size to less than 128 KB, you can also shrink the Block_Maximum_Size.

ZSTD_d_maxBlockSize

This parameter allows the decoder to limit the Block_Maximum_Size independently of the Window_Size. The decompressor allocates Window_Size + 3 * Block_Maximum_Size, so when the Window_Size shrinks the Block_Maximum_Size makes up a significant portion of the memory allocated.

For example, in a streaming compression use case where one client is receiving compressed data from many servers, one might choose a Window_Size = 128 KB to balance compression ratio and memory usage. However, the decoder will still need 512 KB of memory usage. If ZSTD_d_maxBlockSize is set to 4KB, maybe because packets are expected to be less than 4KB, then the memory usage shrinks dramatically to 140 KB.

Setting this parameter means that you are explicitly rejecting valid Zstandard frames, so you must coordinate with the compressor. The compressor must also set ZSTD_c_maxBlockSize to a value no greater than the value of ZSTD_d_maxBlockSize. Otherwise the compressor will almost certainly generate blocks that are larger than the maximum block size, and the decompressor will reject the frame.

Compression

Single-shot

All of Zstd's memory usage in single-shot mode is adjusted to the size of the data being compressed. The hashLog and chainLog are capped to the next power of 2 larger than the source size. If the source is smaller than 128 KB then some internal work buffers are also shrunk proportionally.

ZSTD_c_compressionLevel

First and foremost, higher compression levels will typically use more memory, as they map to advanced parameters that need larger tables. Generally, when tuning for memory usage, you should pick a target compression level and then tune advanced parameters to fine-tune the tradeoff if necessary. Picking a target compression level also gives you a benchmark for speed and compression ratio, so as you tune the advanced parameters, you can make sure you aren't hurting speed or compression ratio too much.

Generally, when optimizing for memory usage, you should stick to compression levels 3 and below, as they are optimized for working well with smaller memory budgets, because they intend their working memory to stay mostly in L2 cache.

ZSTD_c_hashLog

Zstd allocates 4 * (1 << hashLog) bytes for hash tables. Shrinking this value will save memory, and likely increase compression speed, at the cost of compression ratio.

ZSTD_c_chainLog

Zstd allocates 4 * (1 << chainLog) bytes for auxiliary tables. These tables are allocated and used for every strategy except ZSTD_fast, but the usage and meaning of the table changes between strategy. Shrinking this value will save memory, and likely increase compression speed, at the cost of compression ratio.

ZSTD_c_strategy

This parameter normally does not affect memory usage. It will have a large impact on speed, and what the match finder does with its memory changes. The strategies ZSTD_greedy, ZSTD_lazy, and ZSTD_lazy2 use an additional 1 << hashLog bytes. The strategies ZSTD_btopt, and ZSTD_btultra use additional memory.

Generally, when optimizing for memory usage, you should use ZSTD_fast, or ZSTD_dfast. The "flagship" levels for these strategies are 1 and 3 respectively.

ZSTD_c_maxBlockSize

See the streaming section for more details, but Zstd allocates storage proportional to the maximum block size, which is min(source size, 128KB) for single-shot compression. It allocates approximately 4 * Block_Maximum_Size. So shrinking this value can be significant if the ZSTD_c_hashLog and ZSTD_c_chainLog are small. Note that this can also hurt compression ratio and speed, as the fixed costs per block will become more significant.

Streaming

All of the memory optimizations that apply to single-shot compression also apply to streaming. Additionally the ZSTD_c_windowSize and ZSTD_c_maxBlockSize can decrease the amount of memory the streaming compressor needs to store its history window, and compressed block buffer.

ZSTD_c_windowLog

Shrinking the window size will directly shrink Zstandard's streaming memory usage, but it will likely also hurt compression ratio, because it can't look as far in the past for matches. Enforcing smaller window sizes means the decompressor will allocate less memory during decompression of the frame. The decoder can enforce strict limits with ZSTD_d_maxWindowLog.

ZSTD_c_maxBlockSize

In addition to the benefits for single-shot compression, shrinking ZSTD_c_maxBlockSize will reduce the amount of buffer space the streaming compressor needs. The streaming compressor allocates 2 * Block_Maximum_Size bytes of buffer space. If you know that all compressions set ZSTD_c_maxBlockSize, you can also reduce decompression memory usage with ZSTD_d_maxBlockSize. But be careful, because this means the decompressor will reject compressed frames that don't set the same ZSTD_c_maxBlockSize.

ZSTD_c_stableInBuffer

If the input buffer is guaranteed to never change & be available during the entire compression, this parameter can be set to reduce the allocated size by Window_Size + Block_Maximum_Size, and avoid memory copying. Make sure to read the documentation carefully.

ZSTD_c_stableOutBuffer

If the output buffer is guaranteed to never change, then this parameter can be set to reduce the allocated buffer size by Block_Maximum_Size, because Zstandard can write directly into the provided output buffer and avoid copying. Make sure to read the documentation carefully.

Setting both ZSTD_c_stableInBuffer and ZSTD_c_stableOutBuffer makes the streaming API exactly equivalent to the single-shot function ZSTD_compress2().

Example Configurations

Streaming over the network in a many-to-many configuration

The scenario is a case where many clients are talking to many servers, and using Zstandard streaming compression to compress the network traffic from the server to the client. For example, imagine tailing replication logs from many database shards.

If each server is expected to be connected to C clients, then it needs to keep C ZSTD_CCtx context objects in memory. If each client is expected to be connected to S servers, then it needs to keep S ZSTD_DCtx context objects in memory.

Generally, we expect blocks to be small, around 1KB, because every time the server wants to send a packet to the client it needs to use ZSTD_e_flush to flush a compressed block.

Most importantly, we would set the ZSTD_c_windowLog to an appropriate value. This is a tradeoff between compression ratio and memory usage, so you have to measure what makes the most sense for your use case. Here, we found that a window size of 256 KB was a happy medium.

Next, to shrink both compression memory usage, we can set ZSTD_c_maxBlockSize to a smaller value. It defaults to 128 KB, which means the block-sized overheads will be similar to the window size overheads, because we're using a small window. Since we expect most of our blocks to be small anyways, we can set this to a very small value without hurting compression ratio. In this case we will choose 2 KB.

Now that we've set the ZSTD_c_maxBlockSize, we can also set ZSTD_d_maxBlockSize. We have to be careful here, because a consumer that sets ZSTD_d_maxBlockSize can only consume data from a producer with ZSTD_c_maxBlockSize no larger. In this use case, we send the log2 of the maxBlockSize over the network as a header, and then set the ZSTD_d_maxBlockSize accordingly. This solves the coordination issue, and allows us to reduce memory usage. We sent this in a Zstandard Skippable Frame, so that consumers that weren't aware of this header could skip it.

Finally, we will select our compression level. The memory usage of every level is already bounded by shrinking the ZSTD_c_windowLog. We will select compression level 3, because in this case using more memory than level 1 was a worthwhile tradeoff. We left the ZSTD_c_hashLog and ZSTD_c_chainLog as-is, because the tradeoff of level 3 made sense.

In this case we don't need to set ZSTD_d_maxWindowLog, because the producer is trusted to set the ZSTD_c_windowLog appropriately.

Parameter Value
ZSTD_c_windowLog 18
ZSTD_c_maxBlockSize 11
ZSTD_c_compressionLevel 3
ZSTD_d_maxBlockSize 11

To measure the memory usage, you would start your streaming compression, and during or after the operation, query ZSTD_sizeof_CCtx() and ZSTD_sizeof_DCtx(). This tells you exactly how much memory Zstandard currently has allocated.