[rfc-i] Comment on Holistic archiving (section 2.4.3 of draft-flanagan-rfc-preservation-01)

Leonard Rosenthol lrosenth at adobe.com
Wed Oct 29 05:40:20 PDT 2014

Henning - Both OOXML (the modern Word format) and PDF are ISO standards and thus are fully documented.  In addition, these standards have themselves been appropriately archived by various international archives (US, EU, Germany, etc.) due to their being adopted as national standards in those countries.

Also, consider that in the case of PDF, there are NUMEROUS open source parsers in a plethora of languages.

I do, however, agree that XML is readable by technically oriented humans which is a plus - which is why it's included.


From: Henning G Schulzrinne <hgs at cs.columbia.edu<mailto:hgs at cs.columbia.edu>>
Date: Tuesday, October 28, 2014 at 3:37 PM
To: Robert Sparks <rjsparks at nostrum.com<mailto:rjsparks at nostrum.com>>
Cc: "rfc-interest at rfc-editor.org<mailto:rfc-interest at rfc-editor.org>" <rfc-interest at rfc-editor.org<mailto:rfc-interest at rfc-editor.org>>
Subject: Re: [rfc-i] Comment on Holistic archiving (section 2.4.3 of draft-flanagan-rfc-preservation-01)

Also, unlike MS Word or PDF, the structure of XML is documented, in sufficient detail to write a parser, in hundreds of widely-published books. (Leaving aside that modern Word files are also XML.) If all of these books were to disappear without a physical or electronic trace, I suspect at that point, deciphering old RFCs may not be exactly the highest priority for humanity and we're in a "The Road" scenario.

On Tue, Oct 28, 2014 at 3:00 PM, Robert Sparks <rjsparks at nostrum.com<mailto:rjsparks at nostrum.com>> wrote:
The paragraph that starts out with "Consider a future where XML has been obsoleted for half a century" has a rough edge I would like to try to make better.
It talks about the need to keep an environment (programs and OSes) around to be able to use whatever bits you have stored. While that's true for the general case, it overstates what would be needed to be able to read an XML file in the future.

Unlike some compressed or specialized binary format, the bare XML source is relatively directly accessible by humans. The environment you need to keep looks more like the environment you need to keep to be able to read the current ascii RFC files than it does one you would need to keep to read, say PDF,
as long as you keep the XML files stored in a way that is amenable to letting people see the characters easily. That is, the target is more like 'cat' than it is 'adobe reader'.

We're choosing the element names in the XML to be relatively self descriptive. In an archival-recovery situation, a person would have a relatively easy time determining what the bits mean, _especially_ if we keep the definition of the elements around in the same easy to get to format.

Please be careful with choosing what the archived bits really need to be - I think we have an opportunity to avoid much of the complexity the current text warns about in this section.

rfc-interest mailing list
rfc-interest at rfc-editor.org<mailto:rfc-interest at rfc-editor.org>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.rfc-editor.org/pipermail/rfc-interest/attachments/20141029/10a699ea/attachment.html>

More information about the rfc-interest mailing list