[rfc-i] Two issues with draft-flanagan-plaintext-00

Paul Kyzivat pkyzivat at alum.mit.edu
Sun Jun 29 16:30:53 PDT 2014


On 6/29/14 4:43 PM, Ted Lemon wrote:
> On Jun 29, 2014, at 2:38 PM, Tim Bray <tbray at textuality.com> wrote:
>> ​You know, I’ve been one of the louder voices in favor of more modern publishing formats, but I have to say I think the .txt is brilliant for diffs, and diffs are super-important.  Or am I missing something… Is there a good example of a diff using either XML or HTML?​
>
> No, actually .txt is terrible for diffs, because it's so easy for a stupid difference algorithm to confuse two vaguely similar bits of text and wind up upchucking a huge diff that is completely useless.   We run into this very frequently with -bis document reviews in the IESG.
>
> The advantage of diffing the XML is that you can do a better job of saying "this paragraph is the same paragraph as this paragraph" and then diff those paragraphs, because you can see the structure of the document, which is invisible to a text differ.

I've been using document/code diffs for over 40 years, and in all that 
time the same problem has remained:

Reordering blocks of text usually screws the diff up badly. That is a 
much for frequent problem than the one you are describing, and diffing 
the xml doesn't make it any easier to solve.

It *can* be solved, though it may take a lot of computation to do so. 
(We rarely care about that any more.) But even if the underlying diff 
engine can figure it out it is hard to present to a user. Has somebody 
solved *that*?

	Thanks,
	Paul


More information about the rfc-interest mailing list