[rfc-i] Metadata

Paul Hoffman paul.hoffman at vpnc.org
Tue Sep 25 10:46:25 PDT 2012


On Sep 25, 2012, at 10:12 AM, Heather Flanagan (RFC Series Editor) <rse at rfc-editor.org> wrote:

> On 9/21/12 11:40 AM, Paul Hoffman wrote:
>> On Sep 21, 2012, at 11:27 AM, John Levine <johnl at taugh.com> wrote:
>> 
>>> In article <3DD54EFB-5C2F-4A2B-BC5F-5D1BC1654AFE at vpnc.org> you write:
>>>> Greetings again. The -00 draft's discussion of metadata does not reflect the discussion to date. It says:
>>>> =====
>>>> RFC 2223 makes no mention of metadata, i.e. ways to indicate
>>>> particular parts of the text.
>>>> =====
>>>> 
>>>> Metadata is not a way to indicate parts of the text. Metadata is information that is derived
>>>> from, or extracted from, the document. The distinction is fairly important to this discussion in
>>>> that metadata is currently derived both by hand and by heuristic tools.
>>> 
>>> It's anything that's about the document as opposed to part of the
>>> document.  This doesn't exclude information that's also present in the
>>> document, but it's quite common to have metadata that's not in
>>> the document itself.
>> 
>> Exactly right. That's why the first part of my proposed definition is "derived from", not "extracted from".
>> 
>> --Paul Hoffman
> 
> We definitely struggled somewhat with the word "metadata" and coming to
> any kind of common understanding as to what it meant in this context.
> To me, there is a difference between the metadata for which the purpose
> is to describe the document itself and metadata that describes specific
> content within the document.  Both are metadata, but the intent is
> somewhat different.  

Fully agree, but that differentiation is irrelevant to 99% of the people who will use the metadata. They care about "can I use this metadata to find the RFC I want" and "can I use this metadata to find the information I want in this RFC".

> This is in line with Dave's pointer to the
> wikipedia entry.  The distinction between structural metadata (e.g., the
> structure that holds a code component*) and descriptive metadata (e.g.,
> category of the RFC**) seems useful.

To whom? This is a serious question. The RFC Editor should produce metadata that is of the widest value. Producing two unrelated sets of metadata doesn't seem to meet that goal.

> Is that clear, and would that change your proposed definition?

Not yet, but I am interested in why you think that the difference is important for the use of the metadata.

--Paul Hoffman


More information about the rfc-interest mailing list