> On Wed, Sep 27, 2017 at 08:32:15AM +1000, Matthew Kerwin wrote:
> > Emoji are astral codepoints, aren't they? ????????
> Thanks!
> > If you mean font support (I'm not going to trip myself up over the
> > difference between character sets and encodings and all that, but I'm
> > pretty sure 'Unicode' has you covered for characters/codepoints) that's a
> > mostly-solved problem in the modern web, with webfonts and the like.
> Sure, sitting in an airplane, trying to read a document and getting a
> pop-up window to go on the internet, create i think an apple-id to
> be able to download some asian character set (if i remember it correctly).
​I don't know about PDFs, but if you download the HTML version of a page
there's usually an option to download all the linked resources (images,
CSS, etc.) at the same time, so it should continue to work offline.
Although, I don't know if that includes fonts linked from CSS.​

> > So the useful words to put in the table would be basic document types
> > (PDF/text/HTML). We can bicker over what "text" means WRT UTF-8
> elsewhere.
> I definitely would like to have an indication if it's "more than ASCII"
> text
> (eg: foreign characters included).
​Sure, I can understand the need for a multidimensional "requirements for
accurately viewing this resource" description.  Soon enough it will have to
be able to describe the basic format (PDF/HTML/plain text), the character
range (7-bit ASCII, Latin-1[*], BMP, Supplementary[†]), and whether it
includes embedded images.  A single-word description probably isn't enough,

Meanwhile, I figured the words in that column were basically representative
of the formats described in RFC 7990 (and its antecedents.)  In which case,
all the rest is implied.


[*] i.e. the Basic Latin + Latin-1 Supplement blocks; same as ISO-8859-1

[†] Some tools still have issues with characters that don't fit in UCS-2.
There's also "includes four-byte UTF-8 sequences" which is a different
thing, but has caused me issues in the past with some tools.​
