RFC 8785: JSON Canonicalization Scheme (JCS)
- A. Rundgren,
- B. Jordan,
- S. Erdtman
Abstract
Cryptographic operations like hashing and signing need the data to be
expressed in an invariant format so that the operations are reliably
repeatable.
One way to address this is to create a canonical representation of
the data. Canonicalizatio
This document describes the JSON Canonicalizatio
Status of This Memo
This document is not an Internet Standards Track specification; it is published for informational purposes.¶
This is a contribution to the RFC Series, independently of any other RFC stream. The RFC Editor has chosen to publish this document at its discretion and makes no statement about its value for implementation or deployment. Documents approved for publication by the RFC Editor are not candidates for any level of Internet Standard; see Section 2 of RFC 7841.¶
Information about the current status of this document, any
errata, and how to provide feedback on it may be obtained at
https://
Copyright Notice
Copyright (c) 2020 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(https://
1. Introduction
This document describes the JSON Canonicalizatio
Cryptographic operations like hashing and signing need the data to be expressed in an invariant format so that the operations are reliably repeatable. One way to accomplish this is to convert the data into a format that has a simple and fixed representation, like base64url [RFC4648]. This is how JSON Web Signature (JWS) [RFC7515] addressed this issue. Another solution is to create a canonical version of the data, similar to what was done for the XML signature [XMLDSIG] standard.¶
The primary advantage with a canonicalizing scheme is that data
can be kept in its original form. This is the core rationale behind
JCS.
Put another way, using canonicalizatio
To avoid "reinventing the wheel", JCS relies on the serialization of JSON primitives (strings, numbers, and literals), as defined by ECMAScript (aka JavaScript) [ECMA-262] beginning with version 6.¶
Seasoned XML developers may recall difficulties getting XML signatures
to validate. This was usually due to different interpretations of the
quite intricate
XML canonicalizatio
JCS is compatible with some existing systems relying on JSON
canonicalizatio
For potential uses outside of cryptography, see [JSONCOMP].¶
The intended audiences of this document are JSON tool vendors as well as designers of JSON-based cryptographic solutions. The reader is assumed to be knowledgeable in ECMAScript, including the "JSON" object.¶
2. Terminology
Note that this document is not on the IETF standards track. However, a
conformant
implementation is supposed to adhere to the specified behavior for
security and interoperabilit
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
3. Detailed Operation
This section describes the details related to creating a canonical JSON representation and how they are addressed by JCS.¶
Appendix F describes the RECOMMENDED way of adding JCS support to existing JSON tools.¶
3.1. Creation of Input Data
Data to be canonically serialized is usually created by:¶
Irrespective of the method used, the data to be serialized MUST be adapted for I‑JSON [RFC7493] formatting, which implies the following:¶
An additional constraint is that parsed JSON string data MUST NOT be altered during subsequent serializations. For more information, see Appendix E.¶
Note: Although the Unicode standard offers the possibility of rearranging certain character sequences, referred to as "Unicode Normalization" [UCNORM], JCS-compliant string processing does not take this into consideration. That is, all components involved in a scheme depending on JCS MUST preserve Unicode string data "as is".¶
3.2. Generation of Canonical JSON Data
The following subsections describe the steps required to create a canonical JSON representation of the data elaborated on in the previous section.¶
Appendix A shows sample code
for an ECMAScript
3.2.1. Whitespace
Whitespace between JSON tokens MUST NOT be emitted.¶
3.2.2. Serialization of Primitive Data Types
Assume the following JSON object is parsed:¶
If the parsed data is subsequently serialized using a serializer
compliant with ECMAScript's "JSON
The reason for the difference between the parsed data and its serialized counterpart is due to a wide tolerance on input data (as defined by JSON [RFC8259]), while output data (as defined by ECMAScript) has a fixed representation. As can be seen in the example, numbers are subject to rounding as well.¶
The following subsections describe the serialization of primitive JSON data types according to JCS. This part is identical to that of ECMAScript. In the (unlikely) event that a future version of ECMAScript would invalidate any of the following serialization methods, it will be up to the developer community to either stick to this specification or create a new specification.¶
3.2.2.1. Serialization of Literals
In accordance with JSON [RFC8259], the literals "null", "true", and "false" MUST be serialized as null, true, and false, respectively.¶
3.2.2.2. Serialization of Strings
For JSON string data (which includes JSON object property names as well), each Unicode code point MUST be serialized as described below (see Section 24.3.2.2 of [ECMA-262]):¶
Finally, the resulting sequence of Unicode code points MUST be enclosed in double quotes (").¶
Note: Since invalid Unicode data like "lone surrogates" (e.g.,
U+DEAD)
may lead to interoperabilit
3.2.2.3. Serialization of Numbers
ECMAScript builds on the IEEE 754 [IEEE754] double
Due to the relative complexity of this part, the algorithm itself is not included in this document. For implementers of JCS-compliant number serialization, Google's implementation in V8 [V8] may serve as a reference. Another compatible number serialization reference implementation is Ryu [RYU], which is used by the JCS open-source Java implementation mentioned in Appendix G. Appendix B holds a set of IEEE 754 sample values and their corresponding JSON serialization.¶
Note: Since Not a Number (NaN) and Infinity are not permitted in JSON, occurrences of NaN or Infinity MUST cause a compliant JCS implementation to terminate with an appropriate error.¶
3.2.3. Sorting of Object Properties
Although the previous step normalized the representation of primitive JSON data types, the result would not yet qualify as "canonical" since JSON object properties are not in lexicographic (alphabetical) order.¶
Applied to the sample in Section 3.2.2, a properly canonicalized version should (with a line wrap added for display purposes only) read as:¶
The rules for lexicographic sorting of JSON object properties according to JCS are as follows:¶
When a JSON object is about to have its properties sorted, the following measures MUST be adhered to:¶
The rationale for basing the sorting algorithm on UTF-16 code units is that it maps directly to the string type in ECMAScript (featured in web browsers and Node.js), Java, and .NET. In addition, JSON only supports escape sequences expressed as UTF-16 code units, making knowledge and handling of such data a necessity anyway. Systems using another internal representation of string data will need to convert JSON property name strings into arrays of UTF-16 code units before sorting. The conversion from UTF-8 or UTF-32 to UTF-16 is defined by the Unicode [UNICODE] standard.¶
The following JSON test data can be used for verifying the correctness of the sorting scheme in a JCS implementation:¶
Expected argument order after sorting property strings:¶
Note: For the purpose of obtaining a deterministic property order, sorting of data encoded in UTF-8 or UTF-32 would also work, but the outcome for JSON data like above would differ and thus be incompatible with this specification. However, in practice, property names are rarely defined outside of 7-bit ASCII, making it possible to sort string data in UTF-8 or UTF-32 format without conversion to UTF-16 and still be compatible with JCS. Whether or not this is a viable option depends on the environment JCS is used in.¶
3.2.4. UTF-8 Generation
Finally, in order to create a platform
Applied to the sample in Section 3.2.3, this should yield the following bytes, here shown in hexadecimal notation:¶
This data is intended to be usable as input to cryptographic methods.¶
4. IANA Considerations
This document has no IANA actions.¶
5. Security Considerations
It is crucial to perform sanity checks on input data to avoid overflowing buffers and similar things that could affect the integrity of the system.¶
When JCS is applied to signature schemes like the one described in Appendix F, applications MUST perform the following operations before acting upon received data:¶
If any of these steps fail, the operation in progress MUST be aborted.¶
6. References
6.1. Normative References
- [ECMA-262]
-
ECMA International, "ECMAScript 2019 Language Specification", Standard ECMA-262 10th Edition, , <https://
www >..ecma -international .org /ecma -262 /10 .0 /index .html - [IEEE754]
-
IEEE, "IEEE Standard for Floating-Point Arithmetic", IEEE 754-2019, DOI 10
.1109 , <https:///IEEESTD .2019 .8766229 ieeexplore >..ieee .org /document /8766229 - [RFC2119]
-
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10
.17487 , , <https:///RFC2119 www >..rfc -editor .org /info /rfc2119 - [RFC7493]
-
Bray, T., Ed., "The I-JSON Message Format", RFC 7493, DOI 10
.17487 , , <https:///RFC7493 www >..rfc -editor .org /info /rfc7493 - [RFC8174]
-
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10
.17487 , , <https:///RFC8174 www >..rfc -editor .org /info /rfc8174 - [RFC8259]
-
Bray, T., Ed., "The JavaScript Object Notation (JSON) Data Interchange Format", STD 90, RFC 8259, DOI 10
.17487 , , <https:///RFC8259 www >..rfc -editor .org /info /rfc8259 - [UCNORM]
-
The Unicode Consortium, "Unicode Normalization Forms", <https://
www >..unicode .org /reports /tr15 / - [UNICODE]
-
The Unicode Consortium, "The Unicode Standard", <https://
www >..unicode .org /versions /latest /
6.2. Informative References
- [JSONCOMP]
-
Rundgren, A., ""Comparable" JSON (JSONCOMP)", Work in Progress, Internet-Draft, draft
-rundgren , , <https://-comparable -json -04 tools >..ietf .org /html /draft -rundgren -comparable -json -04 - [KEYBASE]
-
Keybase, "Canonical Packings for JSON and Msgpack", <https://
keybase >..io /docs /api /1 .0 /canonical _packings - [NODEJS]
-
OpenJS Foundation, "Node.js", <https://
nodejs >..org - [OPENAPI]
-
OpenAPI Initiative, "The OpenAPI Specification: a broadly adopted industry standard for describing modern APIs", <https://
www >..openapis .org / - [RFC4648]
-
Josefsson, S., "The Base16, Base32, and Base64 Data Encodings", RFC 4648, DOI 10
.17487 , , <https:///RFC4648 www >..rfc -editor .org /info /rfc4648 - [RFC7515]
-
Jones, M., Bradley, J., and N. Sakimura, "JSON Web Signature (JWS)", RFC 7515, DOI 10
.17487 , , <https:///RFC7515 www >..rfc -editor .org /info /rfc7515 - [RFC7638]
-
Jones, M. and N. Sakimura, "JSON Web Key (JWK) Thumbprint", RFC 7638, DOI 10
.17487 , , <https:///RFC7638 www >..rfc -editor .org /info /rfc7638 - [RYU]
-
"Ryu floating point number serializing algorithm", commit 27d3c55, , <https://
github >..com /ulfjack /ryu - [V8]
- Google LLC, "What is V8?", <https://v8.dev/>.
- [XMLDSIG]
-
W3C, "XML Signature Syntax and Processing Version 1.1", W3C Recommendation, , <https://
www >..w3 .org /TR /xmldsig -core1 /
Appendix A. ECMAScript Sample Canonicalizer
Below is an example of a JCS canonicalizer for usage with
ECMAScript
Appendix B. Number Serialization Samples
The following table holds a set of ECMAScript
Notes:¶
- (1)
-
For maximum compliance with the ECMAScript "JSON" object,
values that are to be interpreted as true integers
SHOULD be in the range
-900719925474099 1 to 900719925474099 1 . However, how numbers are used in applications does not affect the JCS algorithm.¶ - (2)
- Although a set of specific integers like 2**68 could be regarded as having extended precision, the JCS/ECMAScript number serialization algorithm does not take this into consideration.¶
- (3)
- Values out of range are not permitted in JSON. See Section 3.2.2.3.¶
- (4)
-
This number is exactly 142495392378120
6 .25 but will, after the "Note 2" rule mentioned in Section 3.2.2.3, be truncated and rounded to the closest even value.¶
For a more exhaustive validation of a JCS number serializer, you may test against a file (currently) available in the development portal (see Appendix I) containing a large set of sample values. Another option is running V8 [V8] as a live reference together with a program generating a substantial amount of random IEEE 754 values.¶
Appendix C. Canonicalized JSON as "Wire Format"
Since the result from the canonicalizatio
In fact, the ECMAScript standard way of serializing objects using
"JSON
Using canonicalizatio
Appendix D. Dealing with Big Numbers
There are several issues associated with the JSON number type, here illustrated by the following sample object:¶
Although the sample above conforms to JSON [RFC8259], applications would normally use different native data types for storing "giantNumber" and "int64Max". In addition, monetary data like "payMeThis" would presumably not rely on floating-point data types due to rounding issues with respect to decimal arithmetic.¶
The established way of handling this kind of "overloading" of the JSON number type (at least in an extensible manner) is through mapping mechanisms, instructing parsers what to do with different properties based on their name. However, this greatly limits the value of using the JSON number type outside of its original, somewhat constrained JavaScript context. The ECMAScript "JSON" object does not support mappings to the JSON number type either.¶
Due to the above, numbers that do not have a natural place in the current JSON ecosystem MUST be wrapped using the JSON string type. This is close to a de facto standard for open systems. This is also applicable for other data types that do not have direct support in JSON, like "DateTime" objects as described in Appendix E.¶
Aided by a system using the JSON string type, be it programmatic like¶
or declarative schemes like OpenAPI [OPENAPI], JCS imposes no limits on applications, including when using ECMAScript.¶
Appendix E. String Subtype Handling
Due to the limited set of data types featured in JSON, the JSON string
type is commonly used for holding subtypes. This can, depending on
JSON parsing method, lead to interoperabilit
Assume you want to parse a JSON object where the schema designer assigned the property "big" for holding a "BigInt" subtype and "time" for holding a "DateTime" subtype, while "val" is supposed to be a JSON number compliant with JCS. The following example shows such an object:¶
Parsing of this object can be accomplished by the following ECMAScript statement:¶
After parsing, the actual data can be extracted, which for subtypes, also involves a conversion step using the result of the parsing process (an ECMAScript object) as input:¶
Note that the "BigInt" data type is currently only natively supported by V8 [V8].¶
Canonicalizatio
Although this is (with respect to JCS) technically correct, there is another way of parsing JSON data, which also can be used with ECMAScript as shown below:¶
If you now apply the canonicalizer in Appendix A to "object", the following string would be generated:¶
In this case, the string arguments for "big" and "time" have changed with respect to the original, presumably making an application depending on JCS fail.¶
The reason for the deviation is that in stream- and schema-based JSON
parsers,
the original string argument is typically replaced on the fly
by the native subtype that, when serialized, may exhibit a different
and platform
That is, stream- and schema-based parsing MUST treat subtypes as "pure" (immutable) JSON string types and perform the actual conversion to the designated native type in a subsequent step. In modern programming platforms like Go, Java, and C#, this can be achieved with moderate efforts by combining annotations, getters, and setters. Below is an example in C#/Json.NET showing a part of a class that is serializable as a JSON object:¶
In an application, "Amount" can be accessed as any other property while it is actually represented by a quoted string in JSON contexts.¶
Note: The example above also addresses the constraints on numeric data implied by I-JSON (the C# "decimal" data type has quite different characteristics compared to IEEE 754 double precision).¶
E.1. Subtypes in Arrays
Since the JSON array construct permits mixing arbitrary JSON data types, custom parsing and serialization code may be required to cope with subtypes anyway.¶
Appendix F. Implementation Guidelines
The optimal solution is integrating support for JCS directly
in JSON serializers (parsers need no changes).
That is, canonicalizatio
The post processor concept enables signature creation schemes like the following:¶
A compatible signature verification scheme would then be as follows:¶
A canonicalizer like above is effectively only a "filter", potentially usable with a multitude of quite different cryptographic schemes.¶
Using a JSON serializer with integrated JCS support, the serialization
performed
before the canonicalizatio
Appendix G. Open-Source Implementations
The following open-source implementations have been verified to be compatible with JCS:¶
Appendix H. Other JSON Canonicalization Efforts
There are (and have been) other efforts creating "Canonical JSON". Below is a list of URLs to some of them:¶
The listed efforts all build on text-level JSON-to-JSON
transformations
Appendix I. Development Portal
The JCS specification is currently developed at:
<https://
JCS source code and extensive test data is available at:
<https://
Acknowledgements
Building on ECMAScript number serialization was originally proposed by James Manger. This ultimately led to the adoption of the entire ECMAScript serialization scheme for JSON primitives.¶
Other people who have contributed with valuable input to this specification include Scott Ananian, Tim Bray, Ben Campbell, Adrian Farell, Richard Gibson, Bron Gondwana, John-Mark Gurney, Mike Jones, John Levine, Mark Miller, Matthew Miller, Mark Nottingham, Mike Samuel, Jim Schaad, Robert Tupelo-Schneck, and Michal Wadas.¶
For carrying out real-world concept verification, the software and support for number serialization provided by Ulf Adams, Tanner Gooding, and Remy Oudompheng was very helpful.¶