[rfc-i] v3imp #5 Tag figs with filenames, Internet message data

Sean Leonard dev+ietf at seantek.com
Fri Jan 23 11:23:41 PST 2015


On 1/23/2015 3:14 AM, Julian Reschke wrote:
> On 2015-01-23 10:05, Sean Leonard wrote:
>> Improvement Need
>> #5 Tag figs with filenames, Internet message data
>>
>> This improvement calls for tagging figs with filenames and Internet
>> message data. The term "figs" is meant to include <figure> as well as
>> other non-spec-text data, including what is currently encompassed by
>> <sourcecode>.
>>
>> I note that <sourcecode> is a stab in the right direction.
>>
>> Many standards include figures that are better operated upon as files,
>> such as ASN.1 modules or Python source code. A big problem historically
>> has been that these data have been split apart by pagination and spacing
>> artifacts. When stitching them back together, transformation
>> (copy-and-paste type) errors have occurred with bad results.
>>
>> It makes sense to tag this data with filenames, so that users of the
>> standard can extract the file information as-is and operate upon it.
>>
>> For that matter, I also want to be able to tag a figure or blob of code
>> with a media type (*and* parameters--yes, we need parameters). Calling
>> something "foo.js" is one thing; labeling it as "application/javascript"
>> or "application/json" is more well-defined.
>> ...
>
> I note that we've had artwork/@name and artwork/@type for ages; what's 
> missing?

As evidenced by some comments on this list in response to this "v3imp 
blitz", the definitions of @name and @type are sloppy and amenable to 
differing interpretations.

I believe that computer-processable data in RFCs (i.e., in "figs", not 
spec-text/flowed text) should have the following properties:
• Be treated as "inline" or as "attachment"--in the sense that artwork 
is generally displayed inline with the flowed text, while the intent of 
"attachment" is that it is complete enough that it can be extracted and 
used in running code--in the sense that an ASN.1 module can be saved 
as-is and sent into an ASN.1 compiler to get nice working C/C++ code out 
of it. The meaning of "or" is inclusive--so a thing can have inline AND 
attachment properties, much like a PDF in an e-mail can be displayed as 
an icon or a preview of the first page can be rendered inline, depending 
on the capabilities of the mail client. I am not dictating presentation 
semantics.

• Have metadata that includes filenames, media types (including 
parameters), or informal classifications. The meaning of "or" is 
inclusive. "Informal" classifications means widely accepted conventions 
in the software industry, e.g., file extensions ".js", ".zip", 
".markdown", or informal lists that the RFC Editor maintains, e.g., 
"pseudocode", "asn.1".

• Unicode character-oriented XOR octet oriented (See Improvement #6). 
The meaning of "xor" is exclusive. Because the octet-oriented data has 
metadata including a media type and parameters, it is acceptable to

Concrete proposals, in light of recent posts:

Right now we have <artwork> (which is "inline", more-or-less) and 
<sourcecode> (also "inline", more-or-less). I think we should have 
something like <file> or <content> or <msg> or <attachment> (which is 
"attachment", more-or-less).

<sourcecode> is character-oriented and textual.
<artwork> is currently character-oriented, but given that it encompasses 
SVG and hex-dumps, it is more accurate to say that it is flexible--the 
common denominator is that it is inline, and fundamentally not 
"textual". E.g., @type="ascii-art" is about artwork rather than the 
incidental fact that it uses ASCII characters as its paintbrush.

<attachment> should be octet-oriented.

At least <attachment> but probably also <artwork> and <sourcecode> 
should permit:
@name = filename if saved to a filesystem
@type = type information:
(1) FORMAL: media type and parameters, such as "text/markdown; 
charset=iso-8859-1; variant=pandoc"
(2) CONVENTIONAL: starting with "." as in ".js" means the conventional 
file extension
(3) INFORMAL: otherwise, the list of keywords is informal and maintained 
by the RFC Editor (current draft-15)

If you want to include an Internet message with full-on headers, the 
right way would be <attachment> (or <artwork>) with 
@content-type="message/global" [RFC6532]. Similarly if you want to 
include an HTTP message with full-on headers, the right way is 
@content-type="message/http" [RFC7230].

A processor should infer the type of data as follows:
If @type is present:
  (1) FORMAL -> done.
  (2) CONVENTIONAL -> done.
  (3) INFORMAL
    (a) recognized keyword -> done.
    (b) unrecognized keyword -> see below.

If @type is absent:
  infer type with filename @name (i.e., "CONVENTIONAL" behavior). The 
full filename should be considered, for example, if you have a 
@name="Makefile", well, that's a convention even though it's not a file 
extension.

If @name and @type are both present, the file extension of @name SHOULD 
be consistent with the @type. This should be enforced through the 
publication process, not any formal grammar.

How is this? This proposal avoids creating new registries or attributes.

Sean



More information about the rfc-interest mailing list