RFC 4042

UTF-9 and UTF-18 Efficient Transformation Formats of Unicode, April 1 2005

Canonical URL:
https://www.rfc-editor.org/rfc/rfc4042.txt
File formats:
Plain TextPDF
Status:
INFORMATIONAL
Author:
M. Crispin
Stream:
INDEPENDENT

Cite this RFC: TXT  |  XML

DOI:  10.17487/RFC4042

Discuss this RFC: Send questions or comments to rfc-ise@rfc-editor.org

Other actions: Submit Errata  |  Find IPR Disclosures from the IETF


Abstract

ISO-10646 defines a large character set called the Universal Character Set (UCS), which encompasses most of the world's writing systems. The same set of codepoints is defined by Unicode, which further defines additional character properties and other implementation details. By policy of the relevant standardization committees, changes to Unicode and amendments and additions to ISO/IEC 10646 track each other, so that the character repertoires and code point assignments remain in synchronization. The current representation formats for Unicode (UTF-7, UTF-8, UTF-16) are not storage and computation efficient on platforms that utilize the 9 bit nonet as a natural storage unit instead of the 8 bit octet. This document describes a transformation format of Unicode that takes advantage of the nonet so that the format will be storage and computation efficient. This memo provides information for the Internet community.


For the definition of Status, see RFC 2026.

For the definition of Stream, see RFC 4844.


Download PDF Reader