RFC 8830: WebRTC MediaStream Identification in the Session Description Protocol
- H. Alvestrand
Abstract
This document specifies a Session Description Protocol (SDP) grouping mechanism for RTP media streams that can be used to specify relations between media streams.¶
This mechanism is used to signal the association between the SDP
concept of "media description" and the Web Real-Time Communication (WebRTC) concept of
Media
Status of This Memo
This is an Internet Standards Track document.¶
This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 7841.¶
Information about the current status of this document, any
errata, and how to provide feedback on it may be obtained at
https://
Copyright Notice
Copyright (c) 2021 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(https://
1. Introduction
1.1. Terminology
This document uses terminology from [RFC8825]. In addition, the following terms are used as described below:¶
- RTP stream:
- A stream of RTP packets containing media data [RFC7656].¶
- MediaStream:
- An assembly of
Media
Stream Tracks [W3C .CR ]. One MediaStream can contain multiple Media-mediacapture -streams Stream Tracks, of the same or different types.¶ - Media
Stream Track : - Defined in [W3C
.CR ] as a unidirectional flow of media data (either audio or video, but not both). Corresponds to the [RFC7656] term "source stream". One Media-mediacapture -streams Stream Track can be present in zero, one, or multiple MediaStreams.¶ - Media description:
- Defined in [RFC4566] as a set of fields starting with an "m=" field and terminated by either the next "m=" field or the end of the session description.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
1.2. Structure of This Document
This document adds a new Session Description Protocol (SDP) [RFC4566] mechanism that can attach identifiers to the RTP streams and attach identifiers to the groupings they form. It is designed for use with WebRTC [RFC8825].¶
Section 1.3 gives the background on why a new mechanism is needed.¶
Section 2 gives the definition of the new mechanism.¶
Section 3 gives the necessary semantic
information and procedures for using the "msid" attribute to signal the
association of Media
1.3. Why a New Mechanism Is Needed
When media is carried by RTP [RFC3550], each RTP stream is distinguished inside an RTP session by its Synchronization Source (SSRC); each RTP session is distinguished from all other RTP sessions by being on a different transport association (strictly speaking, two transport associations, one used for RTP and one used for the RTP Control Protocol (RTCP), unless RTP/RTCP multiplexing [RFC5761] is used).¶
SDP [RFC4566] gives a format for describing an SDP session that can contain multiple media descriptions. According to the model used in [RFC8829], each media description describes exactly one media source. If multiple media sources are carried in an RTP session, this is signaled using BUNDLE [RFC8843]; if BUNDLE is not used, each media source is carried in its own RTP session.¶
The SDP Grouping Framework [RFC5888] can be used to
group media descriptions. However, for the use case of WebRTC, there
is the need for an application to specify some application
1.4. The WebRTC MediaStream
The W3C WebRTC API specification [W3C-WebRTC] specifies that communication between
WebRTC entities is done via MediaStreams, which contain
Media
Media
In the RTP specification, RTP streams are identified using the SSRC field. Streams are grouped into RTP sessions and also carry a CNAME. Neither CNAME nor RTP session corresponds to a MediaStream. Therefore, the association of an RTP stream to MediaStreams need to be explicitly signaled.¶
WebRTC defines a mapping (documented in [RFC8829]) where one SDP media description is
used to describe each Media
2. The MSID Mechanism
This document defines a new SDP [RFC4566] media-level
"msid" attribute.
This new attribute allows endpoints to associate RTP
streams that are described in separate media descriptions with the
right MediaStreams, as defined in [W3C-WebRTC]. It also allows endpoints to carry an identifier for
each Media
The value of the "msid" attribute consists of an identifier and an optional "appdata" field.¶
The name of the attribute is "msid".¶
The value of the attribute is specified by the following ABNF [RFC5234] grammar:¶
An example "msid" value for a group with the identifier "examplefoo" and application data "examplebar" might look like this:¶
The identifier is a string of ASCII characters that are legal in a "token", consisting of between 1 and 64 characters.¶
Application data (msid-appdata) is carried on the same line as the identifier, separated from the identifier by a space.¶
The identifier ("msid-id") uniquely identifies a group within the scope of an SDP description.¶
There may be multiple "msid" attributes in a single media description.
This represents the case where a single Media
Multiple media descriptions with the same value for "msid-id" and "msid‑appdata" are not permitted.¶
Endpoints can update the associations between RTP streams as expressed by "msid" attributes at any time.¶
The "msid" attributes depend on the association of RTP streams with media descriptions but do not depend on the association of RTP streams with RTP transports. Therefore, their Mux Category (as defined in [RFC8859]) is NORMAL; the process of deciding on "msid" attributes doesn't have to take into consideration whether or not the RTP streams are bundled.¶
3. Procedures
This section describes the procedures for associating media
descriptions representing Media
In the Javascript API described in that specification, each
MediaStream and Media
The value of the "msid-id" field in the MSID consists of the "id" attribute of a MediaStream, as defined in the MediaStream's WebIDL specification [WEBIDL]. The special value "-" indicates "no MediaStream".¶
The value of the "msid-appdata" field in the MSID, if present,
consists of the
"id" attribute of a Media
When an SDP session description is updated, a specific "msid-id"
value continues to refer to the same MediaStream, and a specific
"msid-appdata" to the same Media
If the "msid" attribute does not conform to the ABNF given here, it SHOULD be ignored.¶
The following is a high-level description of the rules for handling SDP updates. Detailed procedures are located in Section 3.2.¶
In addition to signaling that the track is ended when its "msid"
attribute disappears from the SDP, the track will also be signaled as
being ended when all associated SSRCs have disappeared by the rules of
[RFC3550], Sections 6.3.4
(BYE packet received) and 6.3.5
(timeout), or when the corresponding media description is disabled by
setting the port number to zero. Changing the direction of the media
description (by setting "sendonly", "recvonly", or "inactive" attributes)
will not end the Media
The association between SSRCs and media descriptions is specified in [RFC8829].¶
3.1. Handling of Nonsignaled Tracks
Entities that do not use the mechanism described in this document will not send the "msid" attribute and thus will not send information allowing the mapping of RTP packets to MediaStreams. This means that there will be some incoming RTP packets for which the recipient has no predefined MediaStream ID value.¶
Note that the handling described below is triggered by incoming RTP packets, not SDP negotiation.¶
When communicating with entities that use the MSID mechanism, the only time incoming RTP packets
can be received without an associated MediaStream ID value is when, after the
initial negotiation, a negotiation is performed where the answerer
adds a Media
The recipient of those packets will perform the following steps:¶
The following steps are performed to assign ID values:¶
The process above may involve a considerable amount of buffering before the "stable" state is entered. If the implementation wishes to limit this buffering, it MUST signal to the user that media has been discarded.¶
It follows from the above that Media
3.2. Detailed Offer/Answer Procedures
These procedures are given in terms of sections recommended by
[RFC3264]. They describe the actions to be taken in terms of
MediaStreams and Media
3.2.1. Generating the Initial Offer
For each media description in the offer, if there is an
associated outgoing Media
3.2.2. Answerer Processing of the Offer
For each media description in the offer and each "a=msid" attribute in the media description, the receiver of the offer will perform the following steps:¶
3.2.3. Generating the Answer
The answer is generated in exactly the same manner as the offer. "a=msid" values in the offer do not influence the answer.¶
3.2.4. Offerer Processing of the Answer
The answer is processed in exactly the same manner as the offer.¶
3.2.5. Modifying the Session
On subsequent exchanges, precisely the same procedure as for the initial offer/answer is followed, but with one additional step in the parsing of the offer and answer:¶
3.3. Example SDP Description
The following SDP description shows the representation of a WebRTC PeerConnection with two MediaStreams, each of which has one audio and one video track. Only the parts relevant to the MSID are shown.¶
Line wrapping, empty lines, and comments are added for clarity. They are not part of the SDP.¶
4. IANA Considerations
4.1. Attribute Registration in Existing Registries
IANA has registered the "msid" attribute in the "att-field" (media level only) registry within the "Session Description Protocol (SDP) Parameters" registry, according to the procedures of [RFC4566].¶
The "msid" registration information is as follows:¶
- Contact name, email:
- IETF, contacted via mmusic
@ietf .org, or a successor address designated by IESG¶ - Attribute name:
- msid¶
- Attribute syntax:
-
- Attribute semantics:
- Described in RFC 8830¶
- Attribute value:
- msid-value¶
- Long-form attribute name:
- MediaStream Identifier¶
- Usage level:
- media¶
- Subject to charset:
- The attribute value contains only ASCII characters and is therefore not subject to the charset attribute.¶
- Purpose:
- The attribute can be used to signal the relationship between a WebRTC MediaStream and a set of media descriptions.¶
- O/A Procedures:
- Described in RFC 8830¶
- Appropriate values:
- The details of appropriate values are given in RFC 8830 (this document).¶
- Mux Category:
- NORMAL¶
5. Security Considerations
An adversary with the ability to modify SDP descriptions has the ability to switch around tracks between MediaStreams. This is a special case of the general security consideration that modification of SDP descriptions needs to be confined to entities trusted by the application.¶
If implementing buffering as mentioned in Section 3.1, the amount of buffering should be limited to avoid memory exhaustion attacks.¶
Careless generation of identifiers can leak privacy
No other attacks have been identified that depend on this mechanism.¶
6. References
6.1. Normative References
- [RFC2119]
-
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10
.17487 , , <https:///RFC2119 www >..rfc -editor .org /info /rfc2119 - [RFC3550]
-
Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, DOI 10
.17487 , , <https:///RFC3550 www >..rfc -editor .org /info /rfc3550 - [RFC4566]
-
Handley, M., Jacobson, V., and C. Perkins, "SDP: Session Description Protocol", RFC 4566, DOI 10
.17487 , , <https:///RFC4566 www >..rfc -editor .org /info /rfc4566 - [RFC5234]
-
Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", STD 68, RFC 5234, DOI 10
.17487 , , <https:///RFC5234 www >..rfc -editor .org /info /rfc5234 - [RFC8174]
-
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10
.17487 , , <https:///RFC8174 www >..rfc -editor .org /info /rfc8174 - [RFC8829]
-
Uberti, J., Jennings, C., and E. Rescorla, Ed., "JavaScript Session Establishment Protocol (JSEP)", RFC 8829, DOI 10
.17487 , , <https:///RFC8829 www >..rfc -editor .org /info /rfc8829 - [RFC8859]
-
Nandakumar, S., "A Framework for Session Description Protocol (SDP) Attributes When Multiplexing", RFC 8859, DOI 10
.17487 , , <https:///RFC8859 www >..rfc -editor .org /info /rfc8859 - [W3C-WebRTC]
-
Jennings, C., Boström, H., and J-I. Bruaroey, "WebRTC 1.0: Real-time Communication Between Browsers", W3C Proposed Recommendation, <https://
www >..w3 .org /TR /webrtc / - [W3C
.CR -mediacapture -streams] -
Jennings, C., Aboba, B., Bruaroey, J.-I., and H. Boström, "Media Capture and Streams", W3C Candidate Recommendation, <https://
www >..w3 .org /TR /mediacapture -streams /
6.2. Informative References
- [RFC3264]
-
Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with Session Description Protocol (SDP)", RFC 3264, DOI 10
.17487 , , <https:///RFC3264 www >..rfc -editor .org /info /rfc3264 - [RFC5761]
-
Perkins, C. and M. Westerlund, "Multiplexing RTP Data and Control Packets on a Single Port", RFC 5761, DOI 10
.17487 , , <https:///RFC5761 www >..rfc -editor .org /info /rfc5761 - [RFC5888]
-
Camarillo, G. and H. Schulzrinne, "The Session Description Protocol (SDP) Grouping Framework", RFC 5888, DOI 10
.17487 , , <https:///RFC5888 www >..rfc -editor .org /info /rfc5888 - [RFC7656]
-
Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms for Real-Time Transport Protocol (RTP) Sources", RFC 7656, DOI 10
.17487 , , <https:///RFC7656 www >..rfc -editor .org /info /rfc7656 - [RFC8825]
-
Alvestrand, H., "Overview: Real-Time Protocols for Browser-Based Applications", RFC 8825, DOI 10
.17487 , , <https:///RFC8825 www >..rfc -editor .org /info /rfc8825 - [RFC8843]
-
Holmberg, C., Alvestrand, H., and C. Jennings, "Negotiating Media Multiplexing Using the Session Description Protocol (SDP)", RFC 8843, DOI 10
.17487 , , <https:///RFC8843 www >..rfc -editor .org /info /rfc8843 - [WEBIDL]
-
Chen, E. and T. Gu, "Web IDL", W3C Editor's Draft, , <https://
heycam >..github .io /webidl /
Appendix A. Design Considerations, Rejected Alternatives
One suggested mechanism has been to use CNAME instead of a new attribute. This was abandoned because CNAME identifies a synchronization context; one can imagine both wanting to have tracks from the same synchronization context in multiple MediaStreams and wanting to have tracks from multiple synchronization contexts within one MediaStream (but the latter is impossible, since a MediaStream is defined to impose synchronization on its members).¶
Another suggestion has been to put the "msid" value within an attribute of RTCP SR (sender report) packets. This doesn't offer the ability to know that you have seen all the tracks currently configured for a MediaStream.¶
A suggestion that survived for a number of drafts of this document was to define
MSID as a generic mechanism, where the particular semantics of this
usage of the mechanism would be defined by an "a
Acknowledgements
This note is based on sketches from, among others, Justin Uberti and Cullen Jennings.¶
Special thanks to Flemming Andreasen, Ben Campbell, Miguel Garcia, Martin Thomson, Ted Hardie, Adam Roach, Magnus Westerlund, Alissa Cooper, Sue Hares, and Paul Kyzivat for their work in reviewing this document, with many specific language suggestions.¶