RFC 8478, "Zstandard Compression and the application/zstd Media Type", October 2018


Errata ID: 5786
Status: Held for Document Update
Type: Technical

Reported By: Felix Handte
Date Reported: 2019-07-17
Held for Document Update by: Barry Leiba
Date Held: 2019-07-18

Section says:

   A Compressed_Block has the extra restriction that Block_Size is
   always strictly less than the decompressed size.  If this condition
   cannot be respected, the block must be sent uncompressed instead
   (i.e., treated as a Raw_Block).

It should say:

   If this condition cannot be respected when generating a
   Compressed_Block, the block must be sent uncompressed instead
   (i.e., treated as a Raw_Block).


The RFC as originally written places a limit on the size of compressed
blocks (that they can be no larger than the compressed content they
represent) above and beyond the restrictions placed on the other block

This restriction does not belong in the spec, and it should be
removed. Here's why:

Under only cursory examination, a rule like this makes sense. A
compressed representation that is larger than the uncompressed content
it represents seems useless, since Zstandard supports raw blocks.
However, even if this were true (which, see below), that reasoning
motivates implementing such a fallback in the compressor, it doesn't
explain why compressors should be required to implement such behavior.

However, this restriction is not actually useful for decoders, and its
removal will not negatively affect decompressors or their
interoperability. All conforming decompressor implementations must
already be prepared to accept blocks, including compressed blocks, up
to the Block_Maximum_Decompressed_Size, so loosening this restriction
will not require them to allocate any more memory than required at
present. And in fact, to the best of my knowledge, no decompressor
implementation currently enforces the restriction in question or has
ever done so in the past.

Finally, this restriction does in fact over-constrain compressors.
Compressed blocks that are larger than the content they represent can
nonetheless have value, when they contain entropy tables (e.g., a
Huffman_Tree_Description), the cost of which is amortized over
subsequent blocks that reuse the same table description.

In short, this change is a safe, strict improvement over the existing
language, which better reflects the reality of implementations, and
which removes a restriction which should never have been in the spec
in the first place.

We've already made this change to the Zstandard format document
maintained in the reference implementation repo[0].

[0] https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md#blocks

===== Verifier Notes =====
All this is fine, but the document says exactly what it was meant to say when it was written; this is not an erratum. This is now on record for discussion if the document is updated.

