RFC Errata


Errata Search

 
Source of RFC  
Summary Table Full Records

Found 4 records.

Status: Reported (4)

RFC 9309, "Robots Exclusion Protocol", September 2022

Source of RFC: IETF - NON WORKING GROUP
Area Assignment: art

Errata ID: 7124
Status: Reported
Type: Technical
Publication Format(s) : TEXT, PDF, HTML

Reported By: Samuel K. Lam
Date Reported: 2022-09-10

Section 5.2 says:

In the following case,
/example/page/disallowed.gif MUST be used for the URI
example.com/example/page/disallow.gif.

It should say:

In the following case,
/example/page/disallowed.gif MUST be used for the URI
example.com/example/page/disallowed.gif.

Notes:

The two file names in that sentence ("disallowed.gif" and "disallow.gif")
doesn't match. i.e. "ed" is missing from the second file name.

This error renders the example given in section 5.2 incorrect.

Errata ID: 7128
Status: Reported
Type: Technical
Publication Format(s) : TEXT

Reported By: Yoshiro Yoneya
Date Reported: 2022-09-13

Section 2.2.2 says:

   For example:

   +==================+=======================+=======================+
   | Path             | Encoded Path          | Path to Match         |
   +==================+=======================+=======================+
   | /foo/bar?baz=quz | /foo/bar?baz=quz      | /foo/bar?baz=quz      |
   +------------------+-----------------------+-----------------------+
   | /foo/bar?baz=    | /foo/bar?baz=         | /foo/bar?baz=         |
   | https://foo.bar  | https%3A%2F%2Ffoo.bar | https%3A%2F%2Ffoo.bar |
   +------------------+-----------------------+-----------------------+
   | /foo/bar/        | /foo/bar/%E3%83%84    | /foo/bar/%E3%83%84    |
   | U+E38384         |                       |                       |
   +------------------+-----------------------+-----------------------+
   | /foo/            | /foo/bar/%E3%83%84    | /foo/bar/%E3%83%84    |
   | bar/%E3%83%84    |                       |                       |
   +------------------+-----------------------+-----------------------+
   | /foo/            | /foo/bar/%62%61%7A    | /foo/bar/baz          |
   | bar/%62%61%7A    |                       |                       |
   +------------------+-----------------------+-----------------------+

It should say:

   For example:

   +==================+=======================+=======================+
   | Path             | Encoded Path          | Path to Match         |
   +==================+=======================+=======================+
   | /foo/bar?baz=quz | /foo/bar?baz=quz      | /foo/bar?baz=quz      |
   +------------------+-----------------------+-----------------------+
   | /foo/bar?baz=    | /foo/bar?baz=         | /foo/bar?baz=         |
   | https://foo.bar  | https%3A%2F%2Ffoo.bar | https%3A%2F%2Ffoo.bar |
   +------------------+-----------------------+-----------------------+
   | /foo/bar/        | /foo/bar/%E3%83%84    | /foo/bar/%E3%83%84    |
   | U+30C4           |                       |                       |
   +------------------+-----------------------+-----------------------+
   | /foo/            | /foo/bar/%E3%83%84    | /foo/bar/%E3%83%84    |
   | bar/%E3%83%84    |                       |                       |
   +------------------+-----------------------+-----------------------+
   | /foo/            | /foo/bar/%62%61%7A    | /foo/bar/baz          |
   | bar/%62%61%7A    |                       |                       |
   +------------------+-----------------------+-----------------------+

Notes:

The "Path" component of third example seems to indicate Unicode codepoint, rather than UTF-8 encoded hexadecimal. If it was, the correct codepoint for %E3%83%84 is U+30C4, or ツ (in Unicode form).

Errata ID: 7995
Status: Reported
Type: Technical
Publication Format(s) : TEXT

Reported By: Shawn Tice
Date Reported: 2024-06-18

Section 2.2 says:

path-pattern = "/" *UTF8-char-noctl ; valid URI path pattern

It should say:

path-pattern = ("/" / "*") *UTF8-char-noctl ; valid URI path pattern

Notes:

The `path-pattern` rule requires that `/` be the first character, but the Simple Example in section 5.1 has `Disallow: *.gif$`, where the path pattern starts with a `*`. The notes preceding the example explicitly say: The "*" character designates any character, including the otherwise-required forward slash; see Section 2.2.

This seems to indicate that either the formal syntax is wrong or the guidance in section 5.1 is wrong. I assume the formal syntax is wrong.

Errata ID: 8895
Status: Reported
Type: Technical
Publication Format(s) : TEXT

Reported By: Fabrice Canel
Date Reported: 2026-04-28

Section 2.3.1.5 says:

Crawlers MUST try to parse each line of the robots.txt file. Crawlers 
MUST use the  parseable rules.

It should say:

Crawlers MUST try to parse each line of the robots.txt file. Crawlers 
MUST use the  parseable rules.

To improve maintainability of robots.txt files and reduce duplication 
of rule groups, implementations MAY support a comma-separated list of 
user-agent tokens within a single "User-agent" field.

For example:
    User-agent: agent1, agent2, agent3

is interpreted as equivalent to:
    User-agent: agent1
    User-agent: agent2
    User-agent: agent3

When such a construct is encountered:

- Parsers that support this extension SHOULD split the value on comma
  separators, trim optional whitespace, and treat each token as an 
  independent User-agent field belonging to the same group.

- Parsers that do not support this extension will interpret the full 
  value as a single user-agent token. As per this specification, 
  unmatched user-agent tokens simply result in the group being ignored
  for that crawler, preserving backward compatibility.

- Implementations SHOULD NOT treat comma-separated user-agent values 
  as a parsing error.

This extension is OPTIONAL and intended as a best practice to reduce 
repetition of identical rule groups across multiple user-agents.

Notes:

About 0.3% of robots.txt have an comma in the user-agent fields already. Including http://queue.tickets.fifa.com/robots.txt and http://www.rsn-msk.ru/robots.txt

Report New Errata



Advanced Search