RFC Errata


Errata Search

 
Source of RFC  
Summary Table Full Records

Found 2 records.

Status: Held for Document Update (2)

RFC 8478, "Zstandard Compression and the application/zstd Media Type", October 2018

Note: This RFC has been obsoleted by RFC 8878

Source of RFC: IETF - NON WORKING GROUP

Errata ID: 5786
Status: Held for Document Update
Type: Technical
Publication Format(s) : TEXT

Reported By: Felix Handte
Date Reported: 2019-07-17
Held for Document Update by: Barry Leiba
Date Held: 2019-07-18

Section 3.1.1.2.3 says:

   A Compressed_Block has the extra restriction that Block_Size is
   always strictly less than the decompressed size.  If this condition
   cannot be respected, the block must be sent uncompressed instead
   (i.e., treated as a Raw_Block).

It should say:

   If this condition cannot be respected when generating a
   Compressed_Block, the block must be sent uncompressed instead
   (i.e., treated as a Raw_Block).

Notes:

The RFC as originally written places a limit on the size of compressed
blocks (that they can be no larger than the compressed content they
represent) above and beyond the restrictions placed on the other block
types.

This restriction does not belong in the spec, and it should be
removed. Here's why:

Under only cursory examination, a rule like this makes sense. A
compressed representation that is larger than the uncompressed content
it represents seems useless, since Zstandard supports raw blocks.
However, even if this were true (which, see below), that reasoning
motivates implementing such a fallback in the compressor, it doesn't
explain why compressors should be required to implement such behavior.

However, this restriction is not actually useful for decoders, and its
removal will not negatively affect decompressors or their
interoperability. All conforming decompressor implementations must
already be prepared to accept blocks, including compressed blocks, up
to the Block_Maximum_Decompressed_Size, so loosening this restriction
will not require them to allocate any more memory than required at
present. And in fact, to the best of my knowledge, no decompressor
implementation currently enforces the restriction in question or has
ever done so in the past.

Finally, this restriction does in fact over-constrain compressors.
Compressed blocks that are larger than the content they represent can
nonetheless have value, when they contain entropy tables (e.g., a
Huffman_Tree_Description), the cost of which is amortized over
subsequent blocks that reuse the same table description.

In short, this change is a safe, strict improvement over the existing
language, which better reflects the reality of implementations, and
which removes a restriction which should never have been in the spec
in the first place.

We've already made this change to the Zstandard format document
maintained in the reference implementation repo[0].

[0] https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md#blocks

===== Verifier Notes =====
All this is fine, but the document says exactly what it was meant to say when it was written; this is not an erratum. This is now on record for discussion if the document is updated.

Errata ID: 6303
Status: Held for Document Update
Type: Technical
Publication Format(s) : TEXT

Reported By: Sean Bartell
Date Reported: 2020-10-07
Held for Document Update by: Barry Leiba
Date Held: 2020-10-08

Section 3.1.1.5 says:

The newest offset takes the lead in offset history, shifting others
back (up to its previous place if it was already present).  This
means that when Repeated_Offset1 (most recent) is used, history is
unmodified.  When Repeated_Offset2 is used, it is swapped with
Repeated_Offset1.  If any other offset is used, it becomes
Repeated_Offset1, and the rest are shifted back by 1.

It should say:

The newest offset takes the lead in offset history, shifting others
back (up to its previous place if the new offset is a repeat offset).
This means that when the new offset is a repeat offset referring to
Repeated_Offset1 (most recent), history is unmodified.
When the new offset is a repeat offset referring to Repeated_Offset2,
it is swapped with Repeated_Offset1.  In any other situation, the new
offset becomes Repeated_Offset1 and the rest are shifted back by 1.

Note that if a non-repeat offset happens to match one of the
Repeated_Offset values, it is treated just like any other non-repeat
offset; all the Repeated_Offset values are shifted back by 1.

The following code demonstrates how an offset_value is decoded into
a NewOffset and the Repeated_Offset values are updated.

if offset_value <= 3:
    if literal_length == 0:
        offset_value = offset_value + 1
    if offset_value == 1:
        NewOffset = Repeated_Offset1
    elif offset_value == 2:
        NewOffset = Repeated_Offset2
        Repeated_Offset2 = Repeated_Offset1
        Repeated_Offset1 = NewOffset
    elif offset_value == 3:
        NewOffset = Repeated_Offset3
        Repeated_Offset3 = Repeated_Offset2
        Repeated_Offset2 = Repeated_Offset1
        Repeated_Offset1 = NewOffset
    elif offset_value == 4:
        NewOffset = Repeated_Offset1 - 1
        if NewOffset == 0:
            # corrupted input
            NewOffset = 1
        Repeated_Offset3 = Repeated_Offset2
        Repeated_Offset2 = Repeated_Offset1
        Repeated_Offset1 = NewOffset
elif offset_value > 3:
    NewOffset = offset_value - 3
    Repeated_Offset3 = Repeated_Offset2
    Repeated_Offset2 = Repeated_Offset1
    Repeated_Offset1 = NewOffset

Notes:

Change the explanation of how Repeated_Offset values are updated in order to match the reference implementation. See https://github.com/facebook/zstd/issues/2346

Report New Errata



Advanced Search