[rfc-i] on tooling

"Martin J. Dürst" duerst at it.aoyama.ac.jp
Wed Mar 28 01:21:37 PDT 2012

Hello Fred,

On 2012/03/28 15:55, Fred Baker wrote:

> I appreciated John Levine's comments on "what do we want to do with the document when we see them". We all said that we grepped documents; I have a personal tool that I find useful that can be thought of as a grep for a paragraph as opposed to a line, on which I build other tools. For example, I have a tool that I use to find a reference to an RFC or internet draft that starts from an index and tells me the citation. Here are some examples:
> -----------------------------------------------------------------------------------

[tool output examples deleted]
> -----------------------------------------------------------------------------------
> You can imagine I find these things useful. As I read about UTF-8, I wonder what the impact is.

Close to zero. If you have a really, really old Unix system, you might 
have a version of grep that barks at 8-bit data. If you want to search 
for non-ASCII characters with your tool, then I'd recommend to set the 
locale (LANG environment variable) to something that has an UTF-8 
encoding. Most Linux distributions these days come with that out of the 
box. There might also be some effect from the tools you are using. If 
you tell me what your "paragraph grep" is implemented with, I should be 
able to tell you more.

> My general comment is that I find tools like these that can analyze internet drafts and RFCs useful, and I'm not sure they exist, in at least this form, for formats like PDF or .doc.

Same doubt here.

> Something else I would like to facilitate, in whatever form we wind up working, is the improvement of grammar checking. One of the nice things about Word is that it has a grammar checker; there are other tools for basic ASCII text. Whatever input form we use, it would be really nice to have a high quality grammar checker that can not only detect actual grammatical errors ("please don't 'think different', 'think differently'"), but unusual word choices that may have semantic impact (use of "there", "their", and "they're" for example, or more generally catching things like "this feature should not be used in this scoped", where the author mistyped and as a result selected a valid word that was incorrect in context). I suspect that such tools could be helpful not only for English-as-a-Second-Language people, but folks like me that think more quickly than they type.

Lots of XML tools (and tools for other formats) come with spell checkers 
these days. Grammar checkers are still quite a bit rarer.

> More generally, I'd like both our input and output formats to facilitate the use of other related tools - rfcdiff and idnits, and language editing and analysis tools.

These tools are our own. It would of course be neat if we could reuse 
them on new formats, and that should be possible with some 
text-extraction preprocessing step even for some rather closed formats. 
But because these are our own, I expect that over time, we will have new 
tools if we accept new formats.

Regards,   Martin.

More information about the rfc-interest mailing list