[rfc-i] For v3: language tagging, but only where useful
"Martin J. Dürst"
duerst at it.aoyama.ac.jp
Tue Jan 14 01:43:40 PST 2014
On 2014/01/14 1:06, Julian Reschke wrote:
> On 2014-01-13 17:04, Julian Reschke wrote:
>> On 2014-01-13 16:53, Nico Williams wrote:
>>> Unicode does have language tag codepoints. Often their use is not
>>> appropriate, but here I think they would be (especially for names or
>>> addresses involving multiple languages, which is not something I'd
>>> expect frequently, but also not something I'd want to preclude ut of
>> We already have xml:lang; we just need to allow it in more places. Isn't
>> that sufficient?
>> Best regards, Julian
> "The tag characters have become deprecated in Unicode 5.1 (2008)."
Yes. And there's even an RFC for that:
To give a bit more history, these now deprecated tag characters were
proposed by the Unicode Consortium as an alternative to MLSF, a proposal
related to ACAP which tried to squeeze language information into byte
sequences not used in UTF-8
It was quickly realized that this was a very bad idea (see also
The language tag characters were on purpose exiled into plane 14, where
each character would take 4 bytes, and a full language tag could easily
take 20 or more bytes, as a clear hint saying "we don't really recommend
these, but we prefer these over even worse stuff (such as ACAP MLSF)".
If the Unicode consortium really had thought that this was something to
be used widely, they would have worked out a scheme needing less bytes.
As time went by, it turned out that ACAP itself didn't go very far (I
still think this is a pity; I'd really have liked to use e.g. portable
keyboard layouts), and no other IETF protocol that we knew about was
picking up on these tags, so the Unicode Consortium decided to deprecate
them, and we issued RFC 6082 to bury and obsolete RFC 2482.
That PDF (at least in some versions) is stuck with them isn't our problem.
More information about the rfc-interest