[rfc-i] Digital Preservation Considerations for the RFC Series -- draft-flanagan-rfc-preservation-00.txt is posted

Julian Reschke julian.reschke at gmx.de
Wed Sep 10 00:27:58 PDT 2014

On 2014-09-10 05:04, John Levine wrote:
> ...
> Hi.  I think the analysis is fine as far as it goes, but it misses
> some stuff.  In particular, it seems to me that it's at least as
> important to save the format and tool documentation as the software
> because a significant way to recover ancient stuff is to reverse
> engineer the tools using more modern technology.
> To take an example in a different context, if you have a deck of punch
> cards, there are approximately no working card readers left to read
> what's on them.  But it is not hard to kludge something up to run them
> through an optical scanner or past a camera, nor is it hard using
> modern software to find the punched holes in the images and recover
> the text on the cards.  But that depends on knowing the specs for the
> cards, both the physical specs for the hole positions, and the logical
> specs for what combination of holes represents what character.
> I have written scripts in modern(ish) languages like perl and python
> to translate ancient markup languages into something I can run though
> current document formatters.  Hence I would encourage the archive to
> include the XML separate from the PDF/A, because XML would be a lot
> easier to deal with if you're reverse engineering from scratch, and to
> preserve the xml2rfc specs in various online and offline formats to
> help reconstruction efforts.
> ...


FWIW, this is what happened with the xml2rfcv2 processor that was 
rewritten in Python. Partly following the documentation (RFC 2629), 
partly reverse engineering. The latter was mainly needed when the 
documentation was incomplete.

Best regards, Julian

PS: reminder: we can embed XML into HTML as well; this in itself is no 
reason to favor PDF.

More information about the rfc-interest mailing list