[rfc-i] Bug in RFC Search page...

Matthew Kerwin matthew at kerwin.net.au
Tue Sep 26 15:32:15 PDT 2017


On 27 Sep. 2017 07:58, "Toerless Eckert" <tte at cs.fau.de> wrote:

On Tue, Sep 26, 2017 at 10:49:56PM +0200, Carsten Bormann wrote:


> (And no, we don???t need a third category Plain-text-beyond-the-basic-multilingual-plane,
or Plain-text-with-astral for short.)

Not sure what you are referring to "More UTF-8 characters than possible
with iso8859" ?
Given how we should not publish text-only documents other than ascii or
UTF-8, i think we can happily
ignore those legacy options.


Emoji are astral codepoints, aren't they? 🤞🏻


> > HTML ASCII -> HTML
> > HTML UTF-8 -> HTML unicode
>
> I don???t think that distinction ever needs to be made, because HTML
embeds metadata about its charset, and there are no real interop problems
when you do that.

Given how browsers are quite inconsistent in the characer sets they load,
it would be nice to have explicit indication of the character sets required
to render a document. I've seen PDF documents where i had to go to page 100
before some crucial text was not rendered because it used some character
set not available to me.


If you mean font support (I'm not going to trip myself up over the
difference between character sets and encodings and all that, but I'm
pretty sure 'Unicode' has you covered for characters/codepoints) that's a
mostly-solved problem in the modern web, with webfonts and the like.

So the useful words to put in the table would be basic document types
(PDF/text/HTML). We can bicker over what "text" means WRT UTF-8 elsewhere.

Cheers
-- 
Matthew Kerwin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.rfc-editor.org/pipermail/rfc-interest/attachments/20170927/811f00d3/attachment-0001.html>


More information about the rfc-interest mailing list