RFC 6455, "The WebSocket Protocol", December 2011Source of RFC: hybi (app)
Errata ID: 4184
Publication Format(s) : TEXT
Reported By: Jonathan Hall
Date Reported: 2014-11-17
Rejected by: Barry Leiba
Date Rejected: 2014-12-04
Section 5.4 says:
5.4. Fragmentation The primary purpose of fragmentation is to allow sending a message that is of unknown size when the message is started without having to buffer that message. If messages couldn't be fragmented, then an endpoint would have to buffer the entire message so its length could be counted before the first byte is sent. With fragmentation, a server or intermediary may choose a reasonable size buffer and, when the buffer is full, write a fragment to the network. A secondary use-case for fragmentation is for multiplexing, where it is not desirable for a large message on one logical channel to monopolize the output channel, so the multiplexing needs to be free to split the message into smaller fragments to better share the output channel. (Note that the multiplexing extension is not described in this document.) Unless specified otherwise by an extension, frames have no semantic meaning. An intermediary might coalesce and/or split frames, if no extensions were negotiated by the client and the server or if some extensions were negotiated, but the intermediary understood all the extensions negotiated and knows how to coalesce and/or split frames in the presence of these extensions. One implication of this is that in absence of extensions, senders and receivers must not depend on the presence of specific frame boundaries. The following rules apply to fragmentation: o An unfragmented message consists of a single frame with the FIN bit set (Section 5.2) and an opcode other than 0. o A fragmented message consists of a single frame with the FIN bit clear and an opcode other than 0, followed by zero or more frames with the FIN bit clear and the opcode set to 0, and terminated by a single frame with the FIN bit set and an opcode of 0. A fragmented message is conceptually equivalent to a single larger message whose payload is equal to the concatenation of the payloads of the fragments in order; however, in the presence of extensions, this may not hold true as the extension defines the interpretation of the "Extension data" present. For instance, "Extension data" may only be present at the beginning of the first fragment and apply to subsequent fragments, or there may be "Extension data" present in each of the fragments that applies only to that particular fragment. In the absence of "Extension data", the following example demonstrates how fragmentation works. EXAMPLE: For a text message sent as three fragments, the first fragment would have an opcode of 0x1 and a FIN bit clear, the second fragment would have an opcode of 0x0 and a FIN bit clear, and the third fragment would have an opcode of 0x0 and a FIN bit that is set. o Control frames (see Section 5.5) MAY be injected in the middle of a fragmented message. Control frames themselves MUST NOT be fragmented. o Message fragments MUST be delivered to the recipient in the order sent by the sender. o The fragments of one message MUST NOT be interleaved between the fragments of another message unless an extension has been negotiated that can interpret the interleaving. o An endpoint MUST be capable of handling control frames in the middle of a fragmented message. o A sender MAY create fragments of any size for non-control messages. o Clients and servers MUST support receiving both fragmented and unfragmented messages. o As control frames cannot be fragmented, an intermediary MUST NOT attempt to change the fragmentation of a control frame. o An intermediary MUST NOT change the fragmentation of a message if any reserved bit values are used and the meaning of these values is not known to the intermediary. o An intermediary MUST NOT change the fragmentation of any message in the context of a connection where extensions have been negotiated and the intermediary is not aware of the semantics of the negotiated extensions. Similarly, an intermediary that didn't see the WebSocket handshake (and wasn't notified about its content) that resulted in a WebSocket connection MUST NOT change the fragmentation of any message of such connection. o As a consequence of these rules, all fragments of a message are of the same type, as set by the first fragment's opcode. Since control frames cannot be fragmented, the type for all fragments in a message MUST be either text, binary, or one of the reserved opcodes. NOTE: If control frames could not be interjected, the latency of a ping, for example, would be very long if behind a large message. Hence, the requirement of handling control frames in the middle of a fragmented message. IMPLEMENTATION NOTE: In the absence of any extension, a receiver doesn't have to buffer the whole frame in order to process it. For example, if a streaming API is used, a part of a frame can be delivered to the application. However, note that this assumption might not hold true for all future WebSocket extensions.
There is no indication or mention of the payload length of the frame with regards to fragmentation. In abstract, it's apparent that the payload length specified in the header of the frame corresponds to the actual frames payload length. However, implementations have been observed that allow the headers payload length to be specified as a higher length than the actual raw payload of that frame. In the event that this occurs, some implementations are reallocating memory to support the length of what is reported as the entire payload of all fragmented messages combined, and continue building the buffer off of each frame from that point forward by allocating enough memory for specified payload length. Obviously, this opens up the potential for a memory consumption DoS attack on an implementation that uses this method. A lack of specification with regards to the RFC allows for such a mistake to happen, as it is not clearly defined in the fragmentation section.
The RFC specifies that fragmented messages should be used to send messages of an unknown length, i.e. streaming media, and therefore should also specify that the frame length bit should be preserved to the length of only that current frame and not be used to specify the overall payload length of all fragmented messages combined. Implementations should be forced to work with them on a frame-by-frame basis and to drop the connection of any instances where the specified length of the frames payload does not match the length of the actual raw payload data for that one frame.
Thanks, Jonathan, for some useful comments. As we discussed, these aren't appropriate for the errata system, but should be discussed on the relevant mailing list for possibly inclusion in an update or follow-on to the document. That list is <firstname.lastname@example.org>.