RFC Errata
RFC 8259, "The JavaScript Object Notation (JSON) Data Interchange Format", December 2017
Source of RFC: jsonbis (art)
Errata ID: 7603
Status: Reported
Type: Technical
Publication Format(s) : TEXT
Reported By: Guillaume Fortin-Debigaré
Date Reported: 2023-08-13
Section 1 says:
A string is a sequence of zero or more Unicode characters [UNICODE].
It should say:
A string is a sequence of zero or more Unicode code points [UNICODE].
Notes:
Surrogate code points are not Unicode characters, as explained here: https://www.unicode.org/glossary/#surrogate_character
However, a surrogate code point outside of a surrogate pair is allowed in JSON strings both in escaped and unescaped forms according to the ABNF grammar in section 7 and the warning in section 8.2, despite an UTF-8 incompatibility for the unescaped form. In addition, the original text contradicts ECMA-404 section 9, which states: "A string is a sequence of Unicode code points wrapped with quotation marks (U+0022). All code points may be placed within the quotation marks except for the code points that must be escaped: quotation mark (U+0022), reverse solidus (U+005C), and the control characters U+0000 to U+001F. "