RFC 8829: JavaScript Session Establishment Protocol (JSEP)
- J. Uberti,
- C. Jennings,
- E. Rescorla, Ed.
This RFC is now obsolete
Abstract
This document describes the mechanisms for allowing a
JavaScript application to control the signaling plane of a
multimedia session via the interface specified in the W3C
RTCPeer
Status of This Memo
This is an Internet Standards Track document.¶
This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 7841.¶
Information about the current status of this document, any
errata, and how to provide feedback on it may be obtained at
https://
Copyright Notice
Copyright (c) 2021 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(https://
1. Introduction
This document describes how the W3C Web Real-Time Communication (WebRTC) RTCPeer
1.1. General Design of JSEP
WebRTC call setup has been designed to focus on controlling the media plane, leaving signaling-plane behavior up to the application as much as possible. The rationale is that different applications may prefer to use different protocols, such as the existing SIP call signaling protocol, or something custom to the particular application, perhaps for a novel use case. In this approach, the key information that needs to be exchanged is the multimedia session description, which specifies the transport and media configuration information necessary to establish the media plane.¶
With these considerations in mind, this document describes
the JavaScript Session Establishment Protocol (JSEP), which
allows for full control of the signaling state machine from
JavaScript. As described above, JSEP assumes a model in which a
JavaScript application executes inside a runtime containing
WebRTC APIs (the "JSEP implementation"
In this document, the use of JSEP is described as if it always occurs between two JSEP endpoints. Note, though, that in many cases it will actually be between a JSEP endpoint and some kind of server, such as a gateway or Multipoint Control Unit (MCU). This distinction is invisible to the JSEP endpoint; it just follows the instructions it is given via the API.¶
JSEP's handling of session descriptions is simple and
straightforward
To complete the offer/answer exchange, the remote party uses
the createAnswer API to generate an appropriate answer,
applies it using the set
Regarding ICE [RFC8445], JSEP decouples the ICE state machine from the overall signaling state machine. The ICE state machine must remain in the JSEP implementation because only the implementation has the necessary knowledge of candidates and other transport information. Performing this separation provides additional flexibility in protocols that decouple session descriptions from transport. For instance, in traditional SIP, each offer or answer is self-contained, including both the session descriptions and the transport information. However, [RFC8840] allows SIP to be used with Trickle ICE [RFC8838], in which the session description can be sent immediately and the transport information can be sent when available. Sending transport information separately can allow for faster ICE and DTLS startup, since ICE checks can start as soon as any transport information is available rather than waiting for all of it. JSEP's decoupling of the ICE and signaling state machines allows it to accommodate either model.¶
Although it abstracts signaling, the JSEP approach requires that the application be aware of the signaling process. While the application does not need to understand the contents of session descriptions to set up a call, the application must call the right APIs at the right times, convert the session descriptions and ICE information into the defined messages of its chosen signaling protocol, and perform the reverse conversion on the messages it receives from the other side.¶
One way to make life easier for the application is to provide a JavaScript library that hides this complexity from the developer; said library would implement a given signaling protocol along with its state machine and serialization code, presenting a higher-level call-oriented interface to the application developer. For example, libraries exist to provide implementations of the SIP [RFC3261] and Extensible Messaging and Presence Protocol (XMPP) [RFC6120] signaling protocols atop the JSEP API. Thus, JSEP provides greater control for the experienced developer without forcing any additional complexity on the novice developer.¶
1.2. Other Approaches Considered
One approach that was considered instead of JSEP was to include a lightweight signaling protocol. Instead of providing session descriptions to the API, the API would produce and consume messages from this protocol. While providing a more high-level API, this put more control of signaling within the JSEP implementation, forcing it to have to understand and handle concepts like signaling glare (see [RFC3264], Section 4).¶
A second approach that was considered but not chosen was to decouple the management of the media control objects from session descriptions, instead offering APIs that would control each component directly. This was rejected based on the argument that requiring exposure of this level of complexity to the application programmer would not be beneficial; it would (1) result in an API where even a simple example would require a significant amount of code to orchestrate all the needed interactions and (2) create a large API surface that would need to be agreed upon and documented. In addition, these API points could be called in any order, resulting in a more complex set of interactions with the media subsystem than the JSEP approach, which specifies how session descriptions are to be evaluated and applied.¶
One variation on JSEP that was considered was to keep the
basic session
1.3. Contradiction regarding bundle-only "m=" sections
Since the approval of the WebRTC specification documents, the IETF has become aware of an inconsistency between the document specifying JSEP and the document specifying BUNDLE (this RFC and [RFC8843], respectively). Rather than delaying publication further to come to a resolution, the documents are being published as they were originally approved. The IETF intends to restart work on these technologies, and revised versions of these documents will be published as soon as a resolution becomes available.¶
The specific issue involves the handling of "m=" sections that are designated as bundle-only, as discussed in Section 4.1.1 of this RFC. Currently, there is divergence between JSEP and BUNDLE, as well as between these specifications and existing browser implementations
2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
3. Semantics and Syntax
3.1. Signaling Model
JSEP does not specify a particular signaling model or state machine, other than the generic need to exchange session descriptions in the fashion described by [RFC3264] (offer/answer) in order for both sides of the session to know how to conduct the session. JSEP provides mechanisms to create offers and answers, as well as to apply them to a session. However, the JSEP implementation is totally decoupled from the actual mechanism by which these offers and answers are communicated to the remote side, including addressing, retransmission, forking, and glare handling. These issues are left entirely up to the application; the application has complete control over which offers and answers get handed to the implementation, and when.¶
3.2. Session Descriptions and State Machine
In order to establish the media plane, the JSEP implementation needs specific parameters to indicate what to transmit to the remote side, as well as how to handle the media that is received. These parameters are determined by the exchange of session descriptions in offers and answers, and there are certain details to this process that must be handled in the JSEP APIs.¶
Whether a session description applies to the local side or the remote side affects the meaning of that description. For example, the list of codecs sent to a remote party indicates what the local side is willing to receive, which, when intersected with the set of codecs the remote side supports, specifies what the remote side should send. However, not all parameters follow this rule; some parameters are declarative, and the remote side must either accept them or reject them altogether. An example of such a parameter is the TLS fingerprints [RFC8122] as used in the context of DTLS [RFC6347]; these fingerprints are calculated based on the local certificate(s) offered and are not subject to negotiation.¶
In addition, various RFCs put different conditions on the format of offers versus answers. For example, an offer may propose an arbitrary number of "m=" sections (i.e., media descriptions as described in [RFC4566], Section 5.14), but an answer must contain the exact same number as the offer.¶
Lastly, while the exact media parameters are known only after an offer and an answer have been exchanged, the offerer may receive ICE checks, and possibly media (e.g., in the case of a re-offer after a connection has been established) before it receives an answer. To properly process incoming media in this case, the offerer's media handler must be aware of the details of the offer before the answer arrives.¶
Therefore, in order to handle session descriptions properly, the JSEP implementation needs:¶
JSEP addresses this by adding both set
During the offer/answer exchange, the outstanding offer is considered to be "pending" at the offerer and the answerer, as it may be either accepted or rejected. If this is a re-offer, each side will also have "current" local and remote descriptions, which reflect the result of the last offer/answer exchange. Sections 4.1.14, 4.1.16, 4.1.13, and 4.1.15 provide more detail on pending and current descriptions.¶
JSEP also allows for an answer to be treated as provisional by the application. Provisional answers provide a way for an answerer to communicate initial session parameters back to the offerer, in order to allow the session to begin, while allowing a final answer to be specified later. This concept of a final answer is important to the offer/answer model; when such an answer is received, any extra resources allocated by the caller can be released, now that the exact session configuration is known. These "resources" can include things like extra ICE components, Traversal Using Relays around NAT (TURN) candidates, or video decoders. Provisional answers, on the other hand, do no such deallocation; as a result, multiple dissimilar provisional answers, with their own codec choices, transport parameters, etc., can be received and applied during call setup. Note that the final answer itself may be different than any received provisional answers.¶
In [RFC3264], the constraint at the signaling level is that only one offer can be outstanding for a given session, but at the JSEP level, a new offer can be generated at any point. For example, when using SIP for signaling, if one offer is sent and is then canceled using a SIP CANCEL, another offer can be generated even though no answer was received for the first offer. To support this, the JSEP media layer can provide an offer via the createOffer method whenever the JavaScript application needs one for the signaling. The answerer can send back zero or more provisional answers and then finally end the offer/answer exchange by sending a final answer. The state machine for this is as follows:¶
Aside from these state transitions, there is no other difference between the handling of provisional ("pranswer") and final ("answer") answers.¶
3.3. Session Description Format
JSEP's session descriptions use Session Description Protocol (SDP) syntax for their internal representation. While this format is not optimal for manipulation from JavaScript, it is widely accepted and is frequently updated with new features; any alternate encoding of session descriptions would have to keep pace with the changes to SDP, at least until the time that this new encoding eclipsed SDP in popularity.¶
However, to provide for future flexibility, the SDP syntax
is encapsulated within a Session
As detailed below, most applications should be able to treat
the Session
3.4. Session Description Control
In order to give the application control over various common session parameters, JSEP provides control surfaces that tell the JSEP implementation how to generate session descriptions. This avoids the need for JavaScript to modify session descriptions in most cases.¶
Changes to these objects result in changes to the session
descriptions generated by subsequent create
3.4.1. RtpTransceivers
RtpTransceivers allow the application to control the RTP media associated with one "m=" section. Each RtpTransceiver has an RtpSender and an RtpReceiver, which an application can use to control the sending and receiving of RTP media. The application may also modify the RtpTransceiver directly, for instance, by stopping it.¶
RtpTransceivers generally have a 1:1 mapping with "m=" sections, although there may be more RtpTransceivers than "m=" sections when RtpTransceivers are created but not yet associated with an "m=" section, or if RtpTransceivers have been stopped and disassociated from "m=" sections. An RtpTransceiver is said to be associated with an "m=" section if its media identification (mid) property is non-null; otherwise, it is said to be disassociated. The associated "m=" section is determined using a mapping between transceivers and "m=" section indices, formed when creating an offer or applying a remote offer.¶
An RtpTransceiver is never associated with more than one "m=" section, and once a session description is applied, an "m=" section is always associated with exactly one RtpTransceiver. However, in certain cases where an "m=" section has been rejected, as discussed in Section 5.2.2 below, that "m=" section will be "recycled" and associated with a new RtpTransceiver with a new MID value.¶
RtpTransceivers can be created explicitly by the
application or implicitly by calling set
3.4.2. RtpSenders
RtpSenders allow the application to control how RTP media
is sent. An RtpSender is conceptually responsible for the
outgoing RTP stream(s) described by an "m=" section. This
includes encoding the attached Media
3.4.3. RtpReceivers
RtpReceivers allow the application to inspect how RTP
media is received. An RtpReceiver is conceptually responsible
for the incoming RTP stream(s) described by an "m=" section.
This includes processing received RTP media packets, decoding
the incoming stream(s) to produce a remote Media
3.5. ICE
3.5.1. ICE Gathering Overview
JSEP gathers ICE candidates as needed by the application. Collection of ICE candidates is referred to as a gathering phase, and this is triggered either by the addition of a new or recycled "m=" section to the local session description or by new ICE credentials in the description, indicating an ICE restart. Use of new ICE credentials can be triggered explicitly by the application or implicitly by the JSEP implementation in response to changes in the ICE configuration.¶
When the ICE configuration changes in a way that requires
a new gathering phase, a 'needs
When a new gathering phase starts, the ICE agent will notify the application that gathering is occurring through a state change event. Then, when each new ICE candidate becomes available, the ICE agent will supply it to the application via an onicecandidate event; these candidates will also automatically be added to the current and/or pending local session description. Finally, when all candidates have been gathered, a final onicecandidate event will be dispatched to signal that the gathering process is complete.¶
Note that gathering phases only gather the candidates
needed by new
3.5.2. ICE Candidate Trickling
Candidate trickling is a technique through which a caller may incrementally provide candidates to the callee after the initial offer has been dispatched; the semantics of "Trickle ICE" are defined in [RFC8838]. This process allows the callee to begin acting upon the call and setting up the ICE (and perhaps DTLS) connections immediately, without having to wait for the caller to gather all possible candidates. This results in faster media setup in cases where gathering is not performed prior to initiating the call.¶
JSEP supports optional candidate trickling by providing APIs, as described above, that provide control and feedback on the ICE candidate gathering process. Applications that support candidate trickling can send the initial offer immediately and send individual candidates when they get notified of a new candidate; applications that do not support this feature can simply wait for the indication that gathering is complete, and then create and send their offer, with all the candidates, at that time.¶
Upon receipt of trickled candidates, the receiving application will supply them to its ICE agent. This triggers the ICE agent to start using the new remote candidates for connectivity checks.¶
3.5.2.1. ICE Candidate Format
In JSEP, ICE candidates are abstracted by an IceCandidate object, and as with session descriptions, SDP syntax is used for the internal representation.¶
The candidate details are specified in an IceCandidate
field, using the same SDP syntax as the
"candidate
The IceCandidate object contains a field to indicate which ICE username fragment (ufrag) it is associated with, as defined in [RFC8839], Section 5.4. This value is used to determine which session description (and thereby which gathering phase) this IceCandidate belongs to, which helps resolve ambiguities during ICE restarts. If this field is absent in a received IceCandidate (perhaps when communicating with a non-JSEP endpoint), the most recently received session description is assumed.¶
The IceCandidate object also contains fields to indicate which "m=" section it is associated with, which can be identified in one of two ways: either by an "m=" section index or by a MID. The "m=" section index is a zero-based index, with index N referring to the N+1th "m=" section in the session description referenced by this IceCandidate. The MID is a "media stream identification" value, as defined in [RFC5888], Section 4, which provides a more robust way to identify the "m=" section in the session description, using the MID of the associated RtpTransceiver object (which may have been locally generated by the answerer when interacting with a non-JSEP endpoint that does not support the MID attribute, as discussed in Section 5.10 below). If the MID field is present in a received IceCandidate, it MUST be used for identification; otherwise, the "m=" section index is used instead.¶
Implementations MUST be prepared to receive objects with some fields missing, as mentioned above.¶
3.5.3. ICE Candidate Policy
Typically, when gathering ICE candidates, the JSEP
implementation will gather all possible forms of initial
candidates -- host, server
There may also be cases where the application wants to change which types of candidates are used while the session is active. A prime example is where a callee may initially want to use only relay candidates, to avoid leaking location information to an arbitrary caller, but then change to use all candidates (for lower operational cost) once the user has indicated that they want to take the call. For this scenario, the JSEP implementation MUST allow the candidate policy to be changed in mid-session, subject to the aforementioned interactions with local policy.¶
To administer the ICE candidate policy, the JSEP implementation will determine the current setting at the start of each gathering phase. Then, during the gathering phase, the implementation MUST NOT expose candidates disallowed by the current policy to the application, use them as the source of connectivity checks, or indirectly expose them via other fields, such as the raddr/rport attributes for other ICE candidates. Later, if a different policy is specified by the application, the application can apply it by kicking off a new gathering phase via an ICE restart.¶
3.5.4. ICE Candidate Pool
JSEP applications typically inform the JSEP implementation
to begin ICE gathering via the information supplied to
set
When set
One example of where this concept is useful is an application that expects an incoming call at some point in the future, and wants to minimize the time it takes to establish connectivity, to avoid clipping of initial media. By pre-gathering candidates into the pool, it can exchange and start sending connectivity checks from these candidates almost immediately upon receipt of a call. Note, though, that by holding on to these pre-gathered candidates, which will be kept alive as long as they may be needed, the application will consume resources on the STUN/TURN servers it is using. ("STUN" stands for "Session Traversal Utilities for NAT".)¶
3.5.5. ICE Versions
While this specification formally relies on [RFC8445], at the time of its publication, the majority of WebRTC implementations support the version of ICE described in [RFC5245]. The "ice2" attribute defined in [RFC8445] can be used to detect the version in use by a remote endpoint and to provide a smooth transition from the older specification to the newer one. Implementations MUST be able to accept remote descriptions that do not have the "ice2" attribute.¶
3.6. Video Size Negotiation
Video size negotiation is the process through which a receiver can use the "a=imageattr" SDP attribute [RFC6236] to indicate what video frame sizes it is capable of receiving. A receiver may have hard limits on what its video decoder can process, or it may have some maximum set by policy. By specifying these limits in an "a=imageattr" attribute, JSEP endpoints can attempt to ensure that the remote sender transmits video at an acceptable resolution. However, when communicating with a non-JSEP endpoint that does not understand this attribute, any signaled limits may be exceeded, and the JSEP implementation MUST handle this gracefully, e.g., by discarding the video.¶
Note that certain codecs support transmission of samples with aspect ratios other than 1.0 (i.e., non-square pixels). JSEP implementations will not transmit non-square pixels but SHOULD receive and render such video with the correct aspect ratio. However, sample aspect ratio has no impact on the size negotiation described below; all dimensions are measured in pixels, whether square or not.¶
3.6.1. Creating an imageattr Attribute
The receiver will first combine any known local limits
(e.g., hardware decoder capabilities or local policy) to
determine the absolute minimum and maximum sizes it can
receive. If there are no known local limits, the
"a=imageattr" attribute SHOULD be omitted. If these local
limits preclude receiving any video, i.e., the degenerate
case of no permitted resolutions, the "a=imageattr" attribute
MUST be omitted, and the "m=" section MUST be marked as
sendonly
Otherwise, an "a=imageattr" attribute is created with a "recv" direction, and the resulting resolution space formed from the aforementioned intersection is used to specify its minimum and maximum "x=" and "y=" values.¶
The rules here express a single set of preferences, and therefore, the "a=imageattr" "q=" value is not important. It SHOULD be set to "1.0".¶
The "a=imageattr" field is payload type specific. When all video codecs supported have the same capabilities, use of a single attribute, with the wildcard payload type (*), is RECOMMENDED. However, when the supported video codecs have different limitations, specific "a=imageattr" attributes MUST be inserted for each payload type.¶
As an example, consider a system with a multiformat video decoder, which is capable of decoding any resolution from 48x48 to 720p. In this case, the implementation would generate this attribute:¶
This declaration indicates that the receiver is capable of decoding any image resolution from 48x48 up to 1280x720 pixels.¶
3.6.2. Interpreting imageattr Attributes
[RFC6236] defines "a=imageattr" to be an advisory field. This means that it does not absolutely constrain the video formats that the sender can use but gives an indication of the preferred values.¶
This specification prescribes behavior that is more specific. When
a Media
Depending on how the RtpSender is configured, it may be producing a single encoding at a certain resolution or, if simulcast (Section 3.7) has been negotiated, multiple encodings, each at their own specific resolution. In addition, depending on the configuration, each encoding may have the flexibility to reduce resolution when needed or may be locked to a specific output resolution.¶
For each encoding being produced by the RtpSender, the set of "a=imageattr recv" attributes in the corresponding "m=" section of the remote description is processed to determine what should be transmitted. Only attributes that reference the media format selected for the encoding are considered; each such attribute is evaluated individually, starting with the attribute with the highest "q=" value. If multiple attributes have the same "q=" value, they are evaluated in the order they appear in their containing "m=" section. Note that while JSEP endpoints will include at most one "a=imageattr recv" attribute per media format, JSEP endpoints may receive session descriptions from non-JSEP endpoints with "m=" sections that contain multiple such attributes.¶
For each "a=imageattr recv" attribute, the following rules are applied. If this processing is successful, the encoding is transmitted accordingly, and no further attributes are considered for that encoding. Otherwise, the next attribute is evaluated, in the aforementioned order. If none of the supplied attributes can be processed successfully, the encoding MUST NOT be transmitted, and an error SHOULD be raised to the application.¶
3.7. Simulcast
JSEP supports simulcast transmission of a Media
Applications request support for simulcast by configuring multiple encodings on an RtpSender. Upon generation of an offer or answer, these encodings are indicated via SDP markings on the corresponding "m=" section, as described below. Receivers that understand simulcast and are willing to receive it will also include SDP markings to indicate their support, and JSEP endpoints will use these markings to determine whether simulcast is permitted for a given RtpSender. If simulcast support is not negotiated, the RtpSender will only use the first configured encoding.¶
Note that the exact simulcast parameters are up to the sending application. While the aforementioned SDP markings are provided to ensure that the remote side can receive and demux multiple simulcast encodings, the specific resolutions and bitrates to be used for each encoding are purely a send-side decision in JSEP.¶
JSEP currently does not provide a mechanism to configure receipt of simulcast. This means that if simulcast is offered by the remote endpoint, the answer generated by a JSEP endpoint will not indicate support for receipt of simulcast, and as such the remote endpoint will only send a single encoding per "m=" section.¶
In addition, JSEP does not provide a mechanism to handle an incoming offer requesting simulcast from the JSEP endpoint. This means that setting up simulcast in the case where the JSEP endpoint receives the initial offer requires out-of-band signaling or SDP inspection. However, in the case where the JSEP endpoint sets up simulcast in its initial offer, any established simulcast streams will continue to work upon receipt of an incoming re-offer. Future versions of this specification may add additional APIs to handle the incoming initial offer scenario.¶
When using JSEP to transmit multiple encodings from an RtpSender, the techniques from [RFC8853] and [RFC8851] are used. Specifically, when multiple encodings have been configured for an RtpSender, the "m=" section for the RtpSender will include an "a=simulcast" attribute, as defined in [RFC8853], Section 5.1, with a "send" simulcast stream description that lists each desired encoding, and no "recv" simulcast stream description. The "m=" section will also include an "a=rid" attribute for each encoding, as specified in [RFC8851], Section 4; the use of Restriction Identifiers (RIDs, also called rid-ids or RtpStreamIds) allows the individual encodings to be disambiguated even though they are all part of the same "m=" section.¶
3.8. Interactions with Forking
Some call signaling systems allow various types of forking where an SDP Offer may be provided to more than one device. For example, SIP [RFC3261] defines both a "parallel search" and "sequential search". Although these are primarily signaling-level issues that are outside the scope of JSEP, they do have some impact on the configuration of the media plane that is relevant. When forking happens at the signaling layer, the JavaScript application responsible for the signaling needs to make the decisions about what media should be sent or received at any point in time, as well as which remote endpoint it should communicate with; JSEP is used to make sure the media engine can make the RTP and media perform as required by the application. The basic operations that the applications can have the media engine do are as follows:¶
3.8.1. Sequential Forking
Sequential forking involves a call being dispatched to multiple remote callees, where each callee can accept the call, but only one active session ever exists at a time; no mixing of received media is performed.¶
JSEP handles sequential forking well, allowing the application to easily control the policy for selecting the desired remote endpoint. When an answer arrives from one of the callees, the application can choose to apply it as either (1) a provisional answer, leaving open the possibility of using a different answer in the future or (2) a final answer, ending the setup flow.¶
In a "first
In a "last-one-wins" situation, all answers would be applied as provisional answers, and any previous call leg will be terminated. At some point, the application will end the setup process, perhaps with a timer; at this point, the application could reapply the pending remote description as a final answer.¶
3.8.2. Parallel Forking
Parallel forking involves a call being dispatched to multiple remote callees, where each callee can accept the call and multiple simultaneous active signaling sessions can be established as a result. If multiple callees send media at the same time, the possibilities for handling this are described in [RFC3960], Section 3.1. Most SIP devices today only support exchanging media with a single device at a time and do not try to mix multiple early media audio sources, as that could result in a confusing situation. For example, consider having a European ringback tone mixed together with the North American ringback tone -- the resulting sound would not be like either tone and would confuse the user. If the signaling application wishes to only exchange media with one of the remote endpoints at a time, then from a media engine point of view, this is exactly like the sequential forking case.¶
In the parallel forking case where the JavaScript application wishes to simultaneously exchange media with multiple peers, the flow is slightly more complex, but the JavaScript application can follow the strategy that [RFC3960] describes, using UPDATE. The UPDATE approach allows the signaling to set up a separate media flow for each peer that it wishes to exchange media with. In JSEP, this offer used in the UPDATE would be formed by simply creating a new PeerConnection (see Section 4.1) and making sure that the same local media streams have been added into this new PeerConnection. Then the new PeerConnection object would produce an SDP offer that could be used by the signaling to perform the UPDATE strategy discussed in [RFC3960].¶
As a result of sharing the media streams, the application will end up with N parallel PeerConnection sessions, each with a local and remote description and their own local and remote addresses. The media flow from these sessions can be managed using setDirection (see Section 4.2.3), or the application can choose to play out the media from all sessions mixed together. Of course, if the application wants to only keep a single session, it can simply terminate the sessions that it no longer needs.¶
4. Interface
This section details the basic operations that must be present to implement JSEP functionality. The actual API exposed in the W3C API may have somewhat different syntax but should map easily to these concepts.¶
4.1. PeerConnection
4.1.1. Constructor
The PeerConnection constructor allows the application to specify global parameters for the media session, such as the STUN/TURN servers and credentials to use when gathering candidates, as well as the initial ICE candidate policy and pool size, and also the bundle policy to use.¶
If an ICE candidate policy is specified, it functions as
described in
Section 3.5.3, causing the JSEP
implementation to only surface the permitted candidates
(including any implementation
- all:
- All candidates permitted by implementation policy will be gathered and used.¶
- relay:
- All candidates except relay candidates will be filtered out. This obfuscates the location information that might be ascertained by the remote peer from the received candidates. Depending on how the application deploys and chooses relay servers, this could obfuscate location to a metro or possibly even global level.¶
The default ICE candidate policy MUST be set to "all", as this is generally the desired policy and also typically reduces the use of application TURN server resources significantly.¶
If a size is specified for the ICE candidate pool, this indicates the number of ICE components to pre-gather candidates for. Because pre‑gathering results in utilizing STUN/TURN server resources for potentially long periods of time, this MUST only occur upon application request, and therefore the default candidate pool size MUST be zero.¶
The application can specify its preferred policy regarding
use of bundle, the multiplexing mechanism defined in
[RFC8843]. Regardless of policy, the application will always
try to negotiate bundle onto a single transport and will
offer a single bundle group across all "m=" sections; use of
this single transport is contingent upon the answerer
accepting bundle. However, by specifying a policy from the
list below, the application can control exactly how
aggressively it will try to bundle media streams together,
which affects how it will interoperate with a
non
The set of available policies is as follows:¶
- balanced:
- The first "m=" section of each type (audio, video, or application) will contain transport parameters, which will allow an answerer to unbundle that section. The second and any subsequent "m=" sections of each type will be marked bundle-only. The result is that if there are N distinct media types, then candidates will be gathered for N media streams. This policy balances desire to multiplex with the need to ensure that basic audio and video can still be negotiated in legacy cases. When acting as answerer, if there is no bundle group in the offer, the implementation will reject all but the first "m=" section of each type.¶
- max-compat:
- All "m=" sections will contain
transport parameters; none will be marked as bundle-only.
This policy will allow all streams to be received by
non
-bundle -aware endpoints but will require separate candidates to be gathered for each media stream.¶ - max-bundle:
- Only the first "m=" section will contain transport parameters; all streams other than the first will be marked as bundle-only. This policy aims to minimize candidate gathering and maximize multiplexing, at the cost of less compatibility with legacy endpoints. When acting as answerer, the implementation will reject any "m=" sections other than the first "m=" section, unless they are in the same bundle group as that "m=" section.¶
As it provides the best trade-off between performance and compatibility with legacy endpoints, the default bundle policy MUST be set to "balanced".¶
The application can specify its preferred policy regarding use of RTP/RTCP multiplexing [RFC5761] using one of the following policies:¶
- negotiate:
- The JSEP implementation will
gather both RTP and RTCP candidates but also will offer
"a=rtcp-mux", thus allowing for compatibility with either
multiplexing or non
-multiplexing endpoints.¶ - require:
- The JSEP implementation will only
gather RTP candidates and will insert an "a
=rtcp -mux -only" indication into any new "m=" sections in offers it generates. This halves the number of candidates that the offerer needs to gather. Applying a description with an "m=" section that does not contain an "a=rtcp-mux" attribute will cause an error to be returned.¶
The default multiplexing policy MUST be set to "require". Implementations MAY choose to reject attempts by the application to set the multiplexing policy to "negotiate".¶
4.1.2. addTrack
The addTrack method adds a Media
4.1.3. removeTrack
The removeTrack method removes a Media
4.1.4. addTransceiver
The addTransceiver method adds a new RtpTransceiver to the
PeerConnection. If a Media
At the time of creation, the application can also specify a transceiver direction attribute, a set of MediaStreams that the transceiver is associated with (allowing "LS" group assignments), and a set of encodings for the media (used for simulcast as described in Section 3.7).¶
4.1.5. onaddtrack Event
The onaddtrack event is dispatched to the application when a new
remote track has been signaled as a result of a set
4.1.6. createDataChannel
The create
The create
4.1.7. ondatachannel Event
The ondatachannel event is dispatched to the application when a new data channel has been negotiated by the remote side, which can occur at any time after the underlying SCTP/DTLS association has been established. The new data channel object is supplied in the event.¶
4.1.8. createOffer
The createOffer method generates a blob of SDP that contains an offer per [RFC3264] with the supported configurations for the session, including descriptions of the media added to this PeerConnection, the codec, RTP, and RTCP options supported by this implementation, and any candidates that have been gathered by the ICE agent. An options parameter may be supplied to provide additional control over the generated offer. This options parameter allows an application to trigger an ICE restart, for the purpose of reestablishing connectivity.¶
In the initial offer, the generated SDP will contain all desired functionality for the session (functionality that is supported but not desired by default may be omitted); for each SDP line, the generation of the SDP will follow the process defined for generating an initial offer from the specification that defines the given SDP line. The exact handling of initial offer generation is detailed in Section 5.2.1 below.¶
In the event createOffer is called after the session is
established, createOffer will generate an offer to modify the
current session based on any changes that have been made to
the session, e.g., adding or stopping Rtp
Session descriptions generated by createOffer MUST be
immediately usable by set
Calling this method may do things such as generating new
ICE credentials, but it does not change the PeerConnection
state, trigger candidate gathering, or cause media to start
or stop flowing. Specifically, the offer is not applied, and
does not become the pending local description, until
set
4.1.9. createAnswer
The createAnswer method generates a blob of SDP that
contains an SDP answer per [RFC3264] with the supported
configuration for the session that is compatible with the
parameters supplied in the most recent call to
set
As an answer, the generated SDP will contain a specific configuration that specifies how the media plane should be established; for each SDP line, the generation of the SDP MUST follow the process defined for generating an answer from the specification that defines the given SDP line. The exact handling of answer generation is detailed in Section 5.3 below.¶
Session descriptions generated by createAnswer MUST be
immediately usable by set
Calling this method may do things such as generating new
ICE credentials, but it does not change the PeerConnection
state, trigger candidate gathering, or cause a media state
change. Specifically, the answer is not applied, and does not
become the current local description, until
set
4.1.10. SessionDescriptionType
Session description objects
"offer" indicates that a description MUST be parsed as an offer; said description may include many possible media configurations. A description used as an "offer" may be applied any time the PeerConnection is in a "stable" state or applied as an update to a previously supplied but unanswered "offer".¶
"pranswer" indicates that a description MUST be parsed as an answer, but not a final answer, and so MUST NOT result in the freeing of allocated resources. It may result in the start of media transmission, if the answer does not specify an inactive media direction. A description used as a "pranswer" may be applied as a response to an "offer" or as an update to a previously sent "pranswer".¶
"answer" indicates that a description MUST be parsed as an answer, the offer/answer exchange MUST be considered complete, and any resources (decoders, candidates) that are no longer needed SHOULD be released. A description used as an "answer" may be applied as a response to an "offer" or as an update to a previously sent "pranswer".¶
The only difference between a provisional and final answer is that the final answer results in the freeing of any unused resources that were allocated as a result of the offer. As such, the application can use some discretion on whether an answer should be applied as provisional or final and can change the type of the session description as needed. For example, in a serial forking scenario, an application may receive multiple "final" answers, one from each remote endpoint. The application could choose to accept the initial answers as provisional answers and only apply an answer as final when it receives one that meets its criteria (e.g., a live user instead of voicemail).¶
"rollback" is a special session description type indicating that the state machine MUST be rolled back to the previous "stable" state, as described in Section 4.1.10.2. The contents MUST be empty.¶
4.1.10.1. Use of Provisional Answers
Most applications will not need to create answers using
the "pranswer" type. While it is good practice to send an
immediate response to an offer, in order to warm up the
session transport and prevent media clipping, the preferred
handling for a JSEP application is to create and send a
"sendonly" final answer with a null Media
As an example, consider a typical JSEP application that
wants to set up audio and video as quickly as possible.
When the callee receives an offer with audio and video
Media
Of course, some applications may not be able to perform this double offer/answer exchange, particularly ones that are attempting to gateway to legacy signaling protocols. In these cases, pranswer can still provide the application with a mechanism to warm up the transport.¶
4.1.10.2. Rollback
In certain situations, it may be desirable to "undo" a
change made to set
4.1.11. setLocalDescription
The set
This API changes the local media state; among other
things, it sets up local resources for receiving and decoding
media. In order to successfully handle scenarios where the
application wants to offer to change from one media format to
a different, incompatible format, the PeerConnection MUST be
able to simultaneously support use of both the current and
pending local descriptions (e.g., support the codecs that
exist in either description). This dual processing begins
when the PeerConnection enters the "have
This API indirectly controls the candidate gathering process. When a local description is supplied and the number of transports currently in use does not match the number of transports needed by the local description, the PeerConnection will create transports as needed and begin gathering candidates for each transport, using ones from the candidate pool if available.¶
If (1) set
4.1.12. setRemoteDescription
The set
This API changes the local media state; among other things, it sets up local resources for sending and encoding media.¶
If (1) set
4.1.13. currentLocalDescription
The current
A null object will be returned if an offer/answer exchange has not yet been completed.¶
4.1.14. pendingLocalDescription
The pending
A null object will be returned if the state of the
PeerConnection is "stable" or "have
4.1.15. currentRemoteDescription
The current
A null object will be returned if an offer/answer exchange has not yet been completed.¶
4.1.16. pendingRemoteDescription
The pending
A null object will be returned if the state of the
PeerConnection is "stable" or "have
4.1.17. canTrickleIceCandidates
The can
- null:
- No SDP has been received from the other
side, so it is not known if it can handle trickle. This is
the initial value before set
Remote Description is called.¶ - true:
- SDP has been received from the other side indicating that it can support trickle.¶
- false:
- SDP has been received from the other side indicating that it cannot support trickle.¶
As described in
Section 3.5.2, JSEP
implementations always provide candidates to the application
individually, consistent with what is needed for Trickle ICE.
However, applications can use the can
4.1.18. setConfiguration
The set
Calling this method may result in a change to the state of the ICE agent.¶
4.1.19. addIceCandidate
The addIceCandidate method provides an update to the ICE
agent via an IceCandidate object
(Section 3.5.2.1). If the
IceCandidate's candidate field is non-null, the IceCandidate
is treated as a new remote ICE candidate, which will be added
to the current and/or pending remote description according to
the rules defined for Trickle ICE. Otherwise, the
IceCandidate is treated as an end
In either case, the "m=" section index, MID, and ufrag
fields from the supplied IceCandidate are used to determine
which "m=" section and ICE candidate generation the
IceCandidate belongs to, as described in
Section 3.5.2.1 above. In the case
of an end
If any IceCandidate fields contain invalid values or an error occurs during the processing of the IceCandidate object, the supplied IceCandidate MUST be ignored and an error MUST be returned.¶
Otherwise, the new remote candidate or end
4.1.20. onicecandidate Event
The onicecandidate event is dispatched to the application in two situations: (1) when the ICE agent has discovered a new allowed local ICE candidate during ICE gathering, as outlined in Section 3.5.1 and subject to the restrictions discussed in Section 3.5.3, or (2) when an ICE gathering phase completes. The event contains a single IceCandidate object, as defined in Section 3.5.2.1.¶
In the first case, the newly discovered candidate is reflected in the IceCandidate object, and all of its fields MUST be non-null. This candidate will also be added to the current and/or pending local description according to the rules defined for Trickle ICE.¶
In the second case, the event's IceCandidate object
MUST have its candidate field set to null to indicate
that the current gathering phase is complete, i.e., there will be no
further onicecandidate events in this phase. However, the
IceCandidate's ufrag field MUST be specified to
indicate which ICE candidate generation is ending. The IceCandidate's
"m=" section index and MID fields MAY be specified to indicate that
the event applies to a specific "m=" section, or set to null to
indicate it applies to all "m=" sections in the current ICE candidate
generation. This event can be used by the application to generate an
end
4.2. RtpTransceiver
4.2.1. stop
The stop method stops an RtpTransceiver. This will cause future calls to createOffer to generate a zero port for the associated "m=" section. See below for more details.¶
4.2.2. stopped
The stopped property indicates whether the transceiver has been stopped, either by a call to stop or by applying an answer that rejects the associated "m=" section. In either of these cases, it is set to "true" and otherwise will be set to "false".¶
A stopped RtpTransceiver does not send any outgoing RTP or RTCP or process any incoming RTP or RTCP. It cannot be restarted.¶
4.2.3. setDirection
The setDirection method sets the direction of a transceiver, which affects the direction property of the associated "m=" section on future calls to createOffer and createAnswer. The permitted values for direction are "recvonly", "sendrecv", "sendonly", and "inactive", mirroring the identically named direction attributes defined in [RFC4566], Section 6.¶
When creating offers, the transceiver direction is directly reflected in the output, even for re-offers. When creating answers, the transceiver direction is intersected with the offered direction, as explained in Section 5.3 below.¶
Note that while setDirection sets the direction property
of the transceiver immediately (Section 4.2.4), this property
does not immediately affect whether the transceiver's
RtpSender will send or its RtpReceiver will receive. The
direction in effect is represented by the current
4.2.4. direction
The direction property indicates the last value passed into setDirection. If setDirection has never been called, it is set to the direction the transceiver was initialized with.¶
4.2.5. currentDirection
The current
If an answer that references this transceiver has not yet
been applied or if the transceiver is stopped,
current
4.2.6. setCodecPreferences
The set
The codec preferences of an RtpTransceiver can cause codecs to be excluded by subsequent calls to createOffer and createAnswer, in which case the corresponding media formats in the associated "m=" section will be excluded. The codec preferences cannot add media formats that would otherwise not be present.¶
The codec preferences of an RtpTransceiver can also determine the order of codecs in subsequent calls to createOffer and createAnswer, in which case the order of the media formats in the associated "m=" section will follow the specified preferences.¶
5. SDP Interaction Procedures
This section describes the specific procedures to be followed when creating and parsing SDP objects.¶
5.1. Requirements Overview
JSEP implementations MUST comply with the specifications listed below that govern the creation and processing of offers and answers.¶
5.1.1. Usage Requirements
All session descriptions handled by JSEP implementations
The SDP security descriptions mechanism for SRTP keying [RFC4568] MUST NOT be used, as discussed in [RFC8827].¶
5.1.2. Profile Names and Interoperability
For media "m=" sections, JSEP implementations MUST support
the "UDP
Unfortunately, in an attempt at compatibility, some
endpoints generate other profile strings even when they mean
to support one of these profiles. For instance, an endpoint
might generate "RTP/AVP" but supply "a=fingerprint" and
"a=rtcp-fb" attributes, indicating its willingness to support
"UDP
Note that re-offers by JSEP implementations MUST use the correct profile strings even if the initial offer/answer exchange used an (incorrect) older profile string. This simplifies JSEP behavior, with minimal downside, as any remote endpoint that fails to handle such a re-offer will also fail to handle a JSEP endpoint's initial offer.¶
5.2. Constructing an Offer
When createOffer is called, a new SDP description MUST be created that includes the functionality specified in [RFC8834]. The exact details of this process are explained below.¶
5.2.1. Initial Offers
When createOffer is called for the first time, the result is known as the initial offer.¶
The first step in generating an initial offer is to generate session-level attributes, as specified in [RFC4566], Section 5. Specifically:¶
The next step is to generate "m=" sections, as specified in
[RFC4566], Section 5.14. An "m=" section is
generated for each RtpTransceiver that has been added to the
PeerConnection, excluding any stopped Rtp
For each "m=" section generated for an RtpTransceiver, establish a mapping between the transceiver and the index of the generated "m=" section.¶
Each "m=" section, provided it is not marked as bundle-only, MUST contain a unique set of ICE credentials and a unique set of ICE candidates. Bundle-only "m=" sections MUST NOT contain any ICE credentials and MUST NOT gather any candidates.¶
For DTLS, all "m=" sections MUST use any and all certificates that have been specified for the PeerConnection; as a result, they MUST all have the same fingerprint value or values [RFC8122], or these values MUST be session-level attributes.¶
Each "m=" section MUST be generated as specified in [RFC4566], Section 5.14. For the "m=" line itself, the following rules MUST be followed:¶
The "m=" line MUST be followed immediately by a "c=" line, as specified in [RFC4566], Section 5.7. Again, as no candidates are available yet, the "c=" line MUST contain the default value "IN IP4 0.0.0.0", as defined in [RFC8840], Section 4.1.1.¶
[RFC8859] groups SDP attributes into different categories. To avoid unnecessary duplication when bundling, attributes of category IDENTICAL or TRANSPORT MUST NOT be repeated in bundled "m=" sections, repeating the guidance from [RFC8843], Section 7.1.3. This includes "m=" sections for which bundling has been negotiated and is still desired, as well as "m=" sections marked as bundle-only.¶
The following attributes, which are of a category other than IDENTICAL or TRANSPORT, MUST be included in each "m=" section:¶
The following attributes, which are of category IDENTICAL or TRANSPORT, MUST appear only in "m=" sections that either have a unique address or are associated with the BUNDLE-tag. (In initial offers, this means those "m=" sections that do not contain an "a=bundle-only" attribute.)¶
Lastly, if a data channel has been created, an "m=" section
MUST be generated for data. The <media> field MUST be
set to "application", and the <proto> field MUST be set
to "UDP/DTLS/SCTP"
[RFC8841]. The <fmt>
value MUST be set to "webrtc
Within the data "m=" section, an "a=mid" line MUST be
generated and included as described above, along with an
"a=sctp-port" line referencing the SCTP port number, as
defined in
[RFC8841], Section 5.1;
and, if appropriate, an "a
As discussed above, the following attributes of category IDENTICAL or TRANSPORT are included only if the data "m=" section either has a unique address or is associated with the BUNDLE-tag (e.g., if it is the only "m=" section):¶
Once all "m=" sections have been generated, a session-level "a=group" attribute MUST be added as specified in [RFC5888]. This attribute MUST have semantics "BUNDLE" and MUST include the mid identifiers of each "m=" section. The effect of this is that the JSEP implementation offers all "m=" sections as one bundle group. However, whether the "m=" sections are bundle-only or not depends on the bundle policy.¶
The next step is to generate session-level lip sync groups as defined in [RFC5888], Section 7. For each MediaStream referenced by more than one RtpTransceiver (by passing those MediaStreams as arguments to the addTrack and addTransceiver methods), a group of type "LS" MUST be added that contains the MID values for each RtpTransceiver.¶
Attributes that SDP permits to be at either the session level or the media level SHOULD generally be at the media level even if they are identical. This assists development and debugging by making it easier to understand individual media sections, especially if one of a set of initially identical attributes is subsequently changed. However, implementations MAY choose to aggregate attributes at the session level, and JSEP implementations MUST be prepared to receive attributes in either location.¶
Attributes other than the ones specified above MAY be included, except for the following attributes, which are specifically incompatible with the requirements of [RFC8834] and MUST NOT be included:¶
Note that when bundle is used, any additional attributes that are added MUST follow the advice in [RFC8859] on how those attributes interact with bundle.¶
Note that these requirements are in some cases stricter than those of SDP. Implementations MUST be prepared to accept compliant SDP even if it would not conform to the requirements for generating SDP in this specification.¶
5.2.2. Subsequent Offers
When createOffer is called a second (or later) time or is called after a local description has already been installed, the processing is somewhat different than for an initial offer.¶
If the previous offer was not applied using
set
Note that if the application creates an offer by reading
current
If the previous offer was applied using
set
If the previous offer was applied using
set
In addition, for each existing, non-recycled, non-rejected "m=" section in the new offer, the following adjustments are made based on the contents of the corresponding "m=" section in the current local or remote description, as appropriate:¶
The "a
"a=group:LS" attributes are generated in the same way as for initial offers, with the additional stipulation that any lip sync groups that were present in the most recent answer MUST continue to exist and MUST contain any previously existing MID identifiers, as long as the identified "m=" sections still exist and are not rejected, and the group still contains at least two MID identifiers. This ensures that any synchronized "recvonly" "m=" sections continue to be synchronized in the new offer.¶
5.2.3. Options Handling
The createOffer method takes as a parameter an RTCOfferOptions object. Special processing is performed when generating an SDP description if the following options are present.¶
5.2.3.1. IceRestart
If the IceRestart option is specified, with a value of "true", the offer MUST indicate an ICE restart by generating new ICE ufrag and pwd attributes, as specified in [RFC8839], Section 4.4.3.1.1. If this option is specified on an initial offer, it has no effect (since a new ICE ufrag and pwd are already generated). Similarly, if the ICE configuration has changed, this option has no effect, since new ufrag and pwd attributes will be generated automatically. This option is primarily useful for reestablishing connectivity in cases where failures are detected by the application.¶
5.2.3.2. VoiceActivityDetection
Silence suppression, also known as discontinuous transmission ("DTX"), can reduce the bandwidth used for audio by switching to a special encoding when voice activity is not detected, at the cost of some fidelity.¶
If the "Voice
If the "Voice
The "Voice
5.3. Generating an Answer
When createAnswer is called, a new SDP description MUST be created that is compatible with the supplied remote description as well as the requirements specified in [RFC8834]. The exact details of this process are explained below.¶
5.3.1. Initial Answers
When createAnswer is called for the first time after a remote description has been provided, the result is known as the initial answer. If no remote description has been installed, an answer cannot be generated, and an error MUST be returned.¶
Note that the remote description SDP may not have been
created by a JSEP endpoint and may not conform to all the
requirements listed in
Section 5.2. For many cases, this
is not a problem. However, if any mandatory SDP attributes
are missing or functionality listed as mandatory
The first step in generating an initial answer is to generate session-level attributes. The process here is identical to that indicated in Section 5.2.1 above, except that the "a=ice-options" line, with the "trickle" option as specified in [RFC8840], Section 4.1.3 and the "ice2" option as specified in [RFC8445], Section 10, is only included if such an option was present in the offer.¶
The next step is to generate session-level lip sync
groups, as defined in
[RFC5888], Section 7. For each group of type
"LS" present in the offer, select the local RtpTransceivers
that are referenced by the MID values in the specified group,
and determine which of them either reference a common local
MediaStream (specified in the calls to
add
As a simple example, consider the following offer of a single audio and single video track contained in the same MediaStream. SDP lines not relevant to this example have been removed for clarity. As explained in Section 5.2, a group of type "LS" has been added that references each track's RtpTransceiver.¶
If the answerer uses a single MediaStream when it adds its tracks, both of its transceivers will reference this stream, and so the subsequent answer will contain a "LS" group identical to that in the offer, as shown below:¶
However, if the answerer groups its tracks into separate MediaStreams, its transceivers will reference different streams, and so the subsequent answer will not contain a "LS" group.¶
Finally, if the answerer does not add any tracks, its transceivers will not reference any MediaStreams, causing the preferences of the offerer to be maintained, and so the subsequent answer will contain an identical "LS" group.¶
The example in Section 7.2 shows a more involved case of "LS" group generation.¶
The next step is to generate an "m=" section for each "m=" section that is present in the remote offer, as specified in [RFC3264], Section 6. For the purposes of this discussion, any session-level attributes in the offer that are also valid as media-level attributes are considered to be present in each "m=" section. Each offered "m=" section will have an associated RtpTransceiver, as described in Section 5.10. If there are more RtpTransceivers than there are "m=" sections, the unmatched RtpTransceivers will need to be associated in a subsequent offer.¶
For each offered "m=" section, if any of the following conditions are true, the corresponding "m=" section in the answer MUST be marked as rejected by setting the <port> in the "m=" line to zero, as indicated in [RFC3264], Section 6, and further processing for this "m=" section can be skipped:¶
Otherwise, each "m=" section in the answer MUST then be generated as specified in [RFC3264], Section 6.1. For the "m=" line itself, the following rules MUST be followed:¶
The "m=" line MUST be followed immediately by a "c=" line, as specified in [RFC4566], Section 5.7. Again, as no candidates are available yet, the "c=" line MUST contain the default value "IN IP4 0.0.0.0", as defined in [RFC8840], Section 4.1.3.¶
If the offer supports bundle, all "m=" sections to be bundled MUST use the same ICE credentials and candidates; all "m=" sections not being bundled MUST use unique ICE credentials and candidates. Each "m=" section MUST contain the following attributes (which are of attribute types other than IDENTICAL or TRANSPORT):¶
Each "m=" section that is not bundled into another "m=" section MUST contain the following attributes (which are of category IDENTICAL or TRANSPORT):¶
If a data channel "m=" section has been offered, an "m=" section MUST also be generated for data. The <media> field MUST be set to "application", and the <proto> and <fmt> fields MUST be set to exactly match the fields in the offer.¶
Within the data "m=" section, an "a=mid" line MUST be
generated and included as described above, along with an
"a=sctp-port" line referencing the SCTP port number, as
defined in
[RFC8841], Section 5.1;
and, if appropriate, an "a
As discussed above, the following attributes of category IDENTICAL or TRANSPORT are included only if the data "m=" section is not bundled into another "m=" section:¶
Note that if media "m=" sections are bundled into a data "m=" section, then certain TRANSPORT and IDENTICAL attributes may also appear in the data "m=" section even if they would otherwise only be appropriate for a media "m=" section (e.g., "a=rtcp-mux").¶
If "a=group" attributes with semantics of "BUNDLE" are offered, corresponding session-level "a=group" attributes MUST be added as specified in [RFC5888]. These attributes MUST have semantics "BUNDLE" and MUST include all mid identifiers from the offered bundle groups that have not been rejected. Note that regardless of the presence of "a=bundle-only" in the offer, all "m=" sections in the answer MUST NOT have an "a=bundle-only" line.¶
Attributes that are common between all "m=" sections MAY be moved to the session level if explicitly defined to be valid at the session level.¶
The attributes prohibited in the creation of offers are also prohibited in the creation of answers.¶
5.3.2. Subsequent Answers
When createAnswer is called a second (or later) time or is called after a local description has already been installed, the processing is somewhat different than for an initial answer.¶
If the previous answer was not applied using
set
If any session description was previously supplied to
set
5.3.3. Options Handling
The createAnswer method takes as a parameter an
RTCAnswer
The following options are supported in
RTCAnswer
5.3.3.1. VoiceActivityDetection
Silence suppression in the answer is handled as
described in
Section 5.2.3.2, with
one exception: if support for silence suppression was not
indicated in the offer, the Voice
5.4. Modifying an Offer or Answer
The SDP returned from createOffer or createAnswer MUST NOT
be changed before passing it to set
After calling set
As always, the application is solely responsible for what it sends to the other party, and all incoming SDP will be processed by the JSEP implementation to the extent of its capabilities. It is an error to assume that all SDP is well formed; however, one should be able to assume that any implementation of this specification will be able to process, as a remote offer or answer, unmodified SDP coming from any other implementation of this specification.¶
5.5. Processing a Local Description
When a Session
5.6. Processing a Remote Description
When a Session
5.7. Processing a Rollback
A rollback may be performed if the PeerConnection is in any
state except for "stable". This means that both offers and
provisional answers can be rolled back. Rollback can only be
used to cancel proposed changes; there is no support for
rolling back from a "stable" state to a previous "stable" state. If
a rollback is attempted in the "stable" state, processing MUST
stop and an error MUST be returned. Note that this implies that
once the answerer has performed set
The effect of rollback MUST be the same regardless of
whether set
In order to process rollback, a JSEP implementation abandons the current offer/answer transaction, sets the signaling state to "stable", and sets the pending local and/or remote description (see Sections 4.1.14 and 4.1.16) to "null". Any resources or candidates that were allocated by the abandoned local description are discarded; any media that is received is processed according to the previous local and remote descriptions.¶
A rollback disassociates any RtpTransceivers that were
associated with "m=" sections by the application of the
rolled-back session description (see Sections
5.10 and
5.9).
This means that
some RtpTransceivers that were previously associated will no
longer be associated with any "m=" section; in such cases, the
value of the Rtp
5.8. Parsing a Session Description
The SDP contained in the session description object consists of a sequence of text lines, each containing a key-value expression, as described in [RFC4566], Section 5. The SDP is read, line by line, and converted to a data structure that contains the deserialized information. However, SDP allows many types of lines, not all of which are relevant to JSEP applications. For each line, the implementation will first ensure that it is syntactically correct according to its defining ABNF, check that it conforms to the semantics used in [RFC4566] and [RFC3264], and then either parse and store or discard the provided value, as described below.¶
If any line is not well formed or cannot be parsed as described, the parser MUST stop with an error and reject the session description, even if the value is to be discarded. This ensures that implementations do not accidentally misinterpret ambiguous SDP.¶
5.8.1. Session-Level Parsing
First, the session-level lines are checked and parsed. These lines MUST occur in a specific order, and with a specific syntax, as defined in [RFC4566], Section 5. Note that while the specific line types (e.g., "v=", "c=") MUST occur in the defined order, lines of the same type (typically "a=") can occur in any order.¶
The following non-attribute lines are not meaningful in the JSEP context and MAY be discarded once they have been checked.¶
The remaining non-attribute lines are processed as follows:¶
Finally, the attribute lines are processed. Specific processing MUST be applied for the following session-level attribute ("a=") lines:¶
Other attributes that are not relevant to JSEP may also be present, and implementations SHOULD process any that they recognize. As required by [RFC4566], Section 5.13, unknown attribute lines MUST be ignored.¶
Once all the session-level lines have been parsed, processing continues with the lines in "m=" sections.¶
5.8.2. Media Section Parsing
Like the session-level lines, the media section lines MUST occur in the specific order and with the specific syntax defined in [RFC4566], Section 5.¶
The "m=" line itself MUST be parsed as described in [RFC4566], Section 5.14, and the <media>, <port>, <proto>, and <fmt> values stored.¶
Following the "m=" line, specific processing MUST be applied for the following non-attribute lines:¶
Specific processing MUST also be applied for the following attribute lines:¶
If the "m=" <proto> value indicates use of RTP, as described in Section 5.1.2 above, the following attribute lines MUST be processed:¶
Otherwise, if the "m=" <proto> value indicates use of SCTP, the following attribute lines MUST be processed:¶
Other attributes that are not relevant to JSEP may also be present, and implementations SHOULD process any that they recognize. As required by [RFC4566], Section 5.13, unknown attribute lines MUST be ignored.¶
5.8.3. Semantics Verification
Assuming that parsing completes successfully, the parsed description is then evaluated to ensure internal consistency as well as proper support for mandatory features. Specifically, the following checks are performed:¶
If this session description is of type "pranswer" or "answer", the following additional checks are applied:¶
If any of the preceding checks failed, processing MUST stop and an error MUST be returned.¶
5.9. Applying a Local Description
The following steps are performed at the media engine level to apply a local description. If an error is returned, the session MUST be restored to the state it was in before performing these steps.¶
First, "m=" sections are processed. For each "m=" section, the following steps MUST be performed; if any parameters are out of bounds or cannot be applied, processing MUST stop and an error MUST be returned.¶
Finally, if this description is of type "pranswer" or "answer", follow the processing defined in Section 5.11 below.¶
5.10. Applying a Remote Description
The following steps are performed to apply a remote description. If an error is returned, the session MUST be restored to the state it was in before performing these steps.¶
If the answer contains any "a=ice-options" attributes where
"trickle" is listed as an attribute, update the PeerConnection
can
The following steps MUST be performed for attributes at the session level; if any parameters are out of bounds or cannot be applied, processing MUST stop and an error MUST be returned.¶
For each "m=" section, the following steps MUST be performed; if any parameters are out of bounds or cannot be applied, processing MUST stop and an error MUST be returned.¶
Finally, if this description is of type "pranswer" or "answer", follow the processing defined in Section 5.11 below.¶
5.11. Applying an Answer
In addition to the steps mentioned above for processing a local or remote description, the following steps are performed when processing a description of type "pranswer" or "answer".¶
For each "m=" section, the following steps MUST be performed:¶
If the answer contains valid bundle groups, discard any ICE components for the "m=" sections that will be bundled onto the primary ICE components in each bundle, and begin muxing these "m=" sections accordingly, as described in [RFC8843], Section 7.4.¶
If the description is of type "answer" and there are still remaining candidates in the ICE candidate pool, discard them.¶
6. Processing RTP/RTCP
When bundling, associating incoming RTP/RTCP with the proper "m=" section is defined in [RFC8843], Section 9.2. When not bundling, the proper "m=" section is clear from the ICE component over which the RTP/RTCP is received.¶
Once the proper "m=" section or sections are known, RTP/RTCP is delivered
to the Rtp
7. Examples
Note that this example section shows several SDP fragments. To accommodate RFC line-length restrictions, some of the SDP lines have been split into multiple lines, where leading whitespace indicates that a line is a continuation of the previous line. In addition, some blank lines have been added to improve readability but are not valid in SDP.¶
More examples of SDP for WebRTC call flows, including examples with IPv6 addresses, can be found in [SDP4WebRTC].¶
7.1. Simple Example
This section shows a very simple example that sets up a minimal audio/video call between two JSEP endpoints without using Trickle ICE. The example in the following section provides a more detailed example of what could happen in a JSEP session.¶
The code flow below shows Alice's endpoint initiating the
session to Bob's endpoint. The messages from the JavaScript
application in Alice's browser to the JavaScript in Bob's
browser, abbreviated as "AliceJS" and "BobJS", respectively, are
assumed to flow over some signaling protocol via a web server.
The JavaScript on both Alice's side and Bob's side waits for
all candidates before sending the offer or answer, so the
offers and answers are complete; Trickle ICE is not used. The
user agents (JSEP implementations
The SDP for |offer-A1| looks like:¶
The SDP for |answer-A1| looks like:¶
7.2. Detailed Example
This section shows a more involved example of a session between two JSEP endpoints. Trickle ICE is used in full trickle mode, with a bundle policy of "max-bundle", an RTCP mux policy of "require", and a single TURN server. Initially, both Alice and Bob establish an audio channel and a data channel. Later, Bob adds two video flows -- one for his video feed and one for screen sharing, both supporting FEC -- with the video feed configured for simulcast. Alice accepts these video flows but does not add video flows of her own, so they are handled as recvonly. Alice also specifies a maximum video decoder resolution.¶
The SDP for |offer-B1| looks like:¶
|offer
|offer
|offer
The SDP for |answer-B1| looks like:¶
|answer
|answer
|answer
The SDP for |offer-B2| is shown below. In addition to the new "m=" sections for video, both of which are offering FEC and one of which is offering simulcast, note the increment of the version number in the "o=" line; changes to the "c=" line, indicating the local candidate that was selected; and the inclusion of gathered candidates as a=candidate lines.¶
The SDP for |answer-B2| is shown below. In addition to the acceptance of the video "m=" sections, the use of a=recvonly to indicate one-way video, and the use of a=imageattr to limit the received resolution, note the use of setup:passive to maintain the existing DTLS roles.¶
7.3. Early Transport Warmup Example
This example demonstrates the early-warmup technique described in Section 4.1.10.1. Here, Alice's endpoint sends an offer to Bob's endpoint to start an audio/video call. Bob immediately responds with an answer that accepts the audio/video "m=" sections but marks them as sendonly (from his perspective), meaning that Alice will not yet send media. This allows the JSEP implementation to start negotiating ICE and DTLS immediately. Bob's endpoint then prompts him to answer the call, and when he does, his endpoint sends a second offer, which enables the audio and video "m=" sections, and thereby bidirectional media transmission. The advantage of such a flow is that as soon as the first answer is received, the implementation can proceed with ICE and DTLS negotiation and establish the session transport. If the transport setup completes before the second offer is sent, then media can be transmitted by the callee immediately upon answering the call, minimizing perceived post-dial delay. The second offer/answer exchange can also change the preferred codecs or other session parameters.¶
This example also makes use of the "relay" ICE candidate policy described in Section 3.5.3 to minimize the ICE gathering and checking needed.¶
The SDP for |offer-C1| looks like:¶
|offer
The SDP for |answer-C1| looks like:¶
|answer
The SDP for |offer-C2| looks like:¶
The SDP for |answer-C2| looks like:¶
8. Security Considerations
The IETF has published separate documents [RFC8827] [RFC8826] describing the security architecture for WebRTC as a whole. The remainder of this section describes security considerations for this document.¶
While formally the JSEP interface is an API, it is better to
think of it as an Internet protocol, with the application
JavaScript being untrustworthy from the perspective of the JSEP
implementation. Thus, the threat model of
[RFC3552] applies. In particular, JavaScript can
call the API in any order and with any inputs, including
malicious ones. This is particularly relevant when we consider
the SDP that is passed to set
Conversely, the application programmer needs to be aware that
the JavaScript does not have complete control of endpoint
behavior. One case that bears particular mention is that editing
ICE candidates out of the SDP or suppressing trickled candidates
does not have the expected behavior: implementations will still
perform checks from those candidates even if they are not sent to
the other side. Thus, for instance, it is not possible to prevent
the remote peer from learning your public IP address by removing
server
9. IANA Considerations
This document has no IANA actions.¶
10. References
10.1. Normative References
- [RFC2119]
-
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10
.17487 , , <https:///RFC2119 www >..rfc -editor .org /info /rfc2119 - [RFC3261]
-
Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, DOI 10
.17487 , , <https:///RFC3261 www >..rfc -editor .org /info /rfc3261 - [RFC3264]
-
Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with Session Description Protocol (SDP)", RFC 3264, DOI 10
.17487 , , <https:///RFC3264 www >..rfc -editor .org /info /rfc3264 - [RFC3552]
-
Rescorla, E. and B. Korver, "Guidelines for Writing RFC Text on Security Considerations", BCP 72, RFC 3552, DOI 10
.17487 , , <https:///RFC3552 www >..rfc -editor .org /info /rfc3552 - [RFC3605]
-
Huitema, C., "Real Time Control Protocol (RTCP) attribute in Session Description Protocol (SDP)", RFC 3605, DOI 10
.17487 , , <https:///RFC3605 www >..rfc -editor .org /info /rfc3605 - [RFC3711]
-
Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC 3711, DOI 10
.17487 , , <https:///RFC3711 www >..rfc -editor .org /info /rfc3711 - [RFC3890]
-
Westerlund, M., "A Transport Independent Bandwidth Modifier for the Session Description Protocol (SDP)", RFC 3890, DOI 10
.17487 , , <https:///RFC3890 www >..rfc -editor .org /info /rfc3890 - [RFC4145]
-
Yon, D. and G. Camarillo, "TCP-Based Media Transport in the Session Description Protocol (SDP)", RFC 4145, DOI 10
.17487 , , <https:///RFC4145 www >..rfc -editor .org /info /rfc4145 - [RFC4566]
-
Handley, M., Jacobson, V., and C. Perkins, "SDP: Session Description Protocol", RFC 4566, DOI 10
.17487 , , <https:///RFC4566 www >..rfc -editor .org /info /rfc4566 - [RFC4585]
-
Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, "Extended RTP Profile for Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, DOI 10
.17487 , , <https:///RFC4585 www >..rfc -editor .org /info /rfc4585 - [RFC5124]
-
Ott, J. and E. Carrara, "Extended Secure RTP Profile for Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/SAVPF)", RFC 5124, DOI 10
.17487 , , <https:///RFC5124 www >..rfc -editor .org /info /rfc5124 - [RFC5285]
-
Singer, D. and H. Desineni, "A General Mechanism for RTP Header Extensions", RFC 5285, DOI 10
.17487 , , <https:///RFC5285 www >..rfc -editor .org /info /rfc5285 - [RFC5761]
-
Perkins, C. and M. Westerlund, "Multiplexing RTP Data and Control Packets on a Single Port", RFC 5761, DOI 10
.17487 , , <https:///RFC5761 www >..rfc -editor .org /info /rfc5761 - [RFC5888]
-
Camarillo, G. and H. Schulzrinne, "The Session Description Protocol (SDP) Grouping Framework", RFC 5888, DOI 10
.17487 , , <https:///RFC5888 www >..rfc -editor .org /info /rfc5888 - [RFC6236]
-
Johansson, I. and K. Jung, "Negotiation of Generic Image Attributes in the Session Description Protocol (SDP)", RFC 6236, DOI 10
.17487 , , <https:///RFC6236 www >..rfc -editor .org /info /rfc6236 - [RFC6347]
-
Rescorla, E. and N. Modadugu, "Datagram Transport Layer Security Version 1.2", RFC 6347, DOI 10
.17487 , , <https:///RFC6347 www >..rfc -editor .org /info /rfc6347 - [RFC6716]
-
Valin, JM., Vos, K., and T. Terriberry, "Definition of the Opus Audio Codec", RFC 6716, DOI 10
.17487 , , <https:///RFC6716 www >..rfc -editor .org /info /rfc6716 - [RFC6904]
-
Lennox, J., "Encryption of Header Extensions in the Secure Real-time Transport Protocol (SRTP)", RFC 6904, DOI 10
.17487 , , <https:///RFC6904 www >..rfc -editor .org /info /rfc6904 - [RFC7160]
-
Petit-Huguenin, M. and G. Zorn, Ed., "Support for Multiple Clock Rates in an RTP Session", RFC 7160, DOI 10
.17487 , , <https:///RFC7160 www >..rfc -editor .org /info /rfc7160 - [RFC7587]
-
Spittka, J., Vos, K., and JM. Valin, "RTP Payload Format for the Opus Speech and Audio Codec", RFC 7587, DOI 10
.17487 , , <https:///RFC7587 www >..rfc -editor .org /info /rfc7587 - [RFC7742]
-
Roach, A.B., "WebRTC Video Processing and Codec Requirements", RFC 7742, DOI 10
.17487 , , <https:///RFC7742 www >..rfc -editor .org /info /rfc7742 - [RFC7850]
-
Nandakumar, S., "Registering Values of the SDP 'proto' Field for Transporting RTP Media over TCP under Various RTP Profiles", RFC 7850, DOI 10
.17487 , , <https:///RFC7850 www >..rfc -editor .org /info /rfc7850 - [RFC7874]
-
Valin, JM. and C. Bran, "WebRTC Audio Codec and Processing Requirements", RFC 7874, DOI 10
.17487 , , <https:///RFC7874 www >..rfc -editor .org /info /rfc7874 - [RFC8108]
-
Lennox, J., Westerlund, M., Wu, Q., and C. Perkins, "Sending Multiple RTP Streams in a Single RTP Session", RFC 8108, DOI 10
.17487 , , <https:///RFC8108 www >..rfc -editor .org /info /rfc8108 - [RFC8122]
-
Lennox, J. and C. Holmberg, "Connection
-Oriented Media Transport over the Transport Layer Security (TLS) Protocol in the Session Description Protocol (SDP)" , RFC 8122, DOI 10.17487 , , <https:///RFC8122 www >..rfc -editor .org /info /rfc8122 - [RFC8174]
-
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10
.17487 , , <https:///RFC8174 www >..rfc -editor .org /info /rfc8174 - [RFC8445]
-
Keranen, A., Holmberg, C., and J. Rosenberg, "Interactive Connectivity Establishment (ICE): A Protocol for Network Address Translator (NAT) Traversal", RFC 8445, DOI 10
.17487 , , <https:///RFC8445 www >..rfc -editor .org /info /rfc8445 - [RFC8826]
-
Rescorla, E., "Security Considerations for WebRTC", RFC 8826, DOI 10
.17487 , , <https:///RFC8826 www >..rfc -editor .org /info /rfc8826 - [RFC8827]
-
Rescorla, E., "WebRTC Security Architecture", RFC 8827, DOI 10
.17487 , , <https:///RFC8827 www >..rfc -editor .org /info /rfc8827 - [RFC8830]
-
Alvestrand, H., "WebRTC MediaStream Identification in the Session Description Protocol", RFC 8830, DOI 10
.17487 , , <https:///RFC8830 www >..rfc -editor .org /info /rfc8830 - [RFC8834]
-
Perkins, C., Westerlund, M., and J. Ott, "Media Transport and Use of RTP in WebRTC", RFC 8834, DOI 10
.17487 , , <https:///RFC8834 www >..rfc -editor .org /info /rfc8834 - [RFC8838]
-
Ivov, E., Uberti, J., and P. Saint-Andre, "Trickle ICE: Incremental Provisioning of Candidates for the Interactive Connectivity Establishment (ICE) Protocol", RFC 8838, DOI 10
.17487 , , <https:///RFC8838 www >..rfc -editor .org /info /rfc8838 - [RFC8839]
-
Petit-Huguenin, M., Nandakumar, S., Holmberg, C., Keränen, A., and R. Shpount, "Session Description Protocol (SDP) Offer/Answer Procedures for Interactive Connectivity Establishment (ICE)", RFC 8839, DOI 10
.17487 , , <https:///RFC8839 www >..rfc -editor .org /info /rfc8839 - [RFC8840]
-
Ivov, E., Stach, T., Marocco, E., and C. Holmberg, "A Session Initiation Protocol (SIP) Usage for Incremental Provisioning of Candidates for the Interactive Connectivity Establishment (Trickle ICE)", RFC 8840, DOI 10
.17487 , , <https:///RFC8840 www >..rfc -editor .org /info /rfc8840 - [RFC8841]
-
Holmberg, C., Shpount, R., Loreto, S., and G. Camarillo, "Session Description Protocol (SDP) Offer/Answer Procedures for Stream Control Transmission Protocol (SCTP) over Datagram Transport Layer Security (DTLS) Transport", RFC 8841, DOI 10
.17487 , , <https:///RFC8841 www >..rfc -editor .org /info /rfc8841 - [RFC8842]
-
Holmberg, C. and R. Shpount, "Session Description Protocol (SDP) Offer/Answer Considerations for Datagram Transport Layer Security (DTLS) and Transport Layer Security (TLS)", RFC 8842, DOI 10
.17487 , , <https:///RFC8842 www >..rfc -editor .org /info /rfc8842 - [RFC8843]
-
Holmberg, C., Alvestrand, H., and C. Jennings, "Negotiating Media Multiplexing Using the Session Description Protocol (SDP)", RFC 8843, DOI 10
.17487 , , <https:///RFC8843 www >..rfc -editor .org /info /rfc8843 - [RFC8851]
-
Roach, A.B., Ed., "RTP Payload Format Restrictions", RFC 8851, DOI 10
.17487 , , <https:///RFC8851 www >..rfc -editor .org /info /rfc8851 - [RFC8852]
-
Roach, A.B., Nandakumar, S., and P. Thatcher, "RTP Stream Identifier Source Description (SDES)", RFC 8852, DOI 10
.17487 , , <https:///RFC8852 www >..rfc -editor .org /info /rfc8852 - [RFC8853]
-
Burman, B., Westerlund, M., Nandakumar, S., and M. Zanaty, "Using Simulcast in Session Description Protocol (SDP) and RTP Sessions", RFC 8853, DOI 10
.17487 , , <https:///RFC8853 www >..rfc -editor .org /info /rfc8853 - [RFC8854]
-
Uberti, J., "WebRTC Forward Error Correction Requirements", RFC 8854, DOI 10
.17487 , , <https:///RFC8854 www >..rfc -editor .org /info /rfc8854 - [RFC8858]
-
Holmberg, C., "Indicating Exclusive Support of RTP and RTP Control Protocol (RTCP) Multiplexing Using the Session Description Protocol (SDP)", RFC 8858, DOI 10
.17487 , , <https:///RFC8858 www >..rfc -editor .org /info /rfc8858 - [RFC8859]
-
Nandakumar, S., "A Framework for Session Description Protocol (SDP) Attributes When Multiplexing", RFC 8859, DOI 10
.17487 , , <https:///RFC8859 www >..rfc -editor .org /info /rfc8859
10.2. Informative References
- [RFC3389]
-
Zopf, R., "Real-time Transport Protocol (RTP) Payload for Comfort Noise (CN)", RFC 3389, DOI 10
.17487 , , <https:///RFC3389 www >..rfc -editor .org /info /rfc3389 - [RFC3556]
-
Casner, S., "Session Description Protocol (SDP) Bandwidth Modifiers for RTP Control Protocol (RTCP) Bandwidth", RFC 3556, DOI 10
.17487 , , <https:///RFC3556 www >..rfc -editor .org /info /rfc3556 - [RFC3960]
-
Camarillo, G. and H. Schulzrinne, "Early Media and Ringing Tone Generation in the Session Initiation Protocol (SIP)", RFC 3960, DOI 10
.17487 , , <https:///RFC3960 www >..rfc -editor .org /info /rfc3960 - [RFC4568]
-
Andreasen, F., Baugher, M., and D. Wing, "Session Description Protocol (SDP) Security Descriptions for Media Streams", RFC 4568, DOI 10
.17487 , , <https:///RFC4568 www >..rfc -editor .org /info /rfc4568 - [RFC4588]
-
Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R. Hakenberg, "RTP Retransmission Payload Format", RFC 4588, DOI 10
.17487 , , <https:///RFC4588 www >..rfc -editor .org /info /rfc4588 - [RFC4733]
-
Schulzrinne, H. and T. Taylor, "RTP Payload for DTMF Digits, Telephony Tones, and Telephony Signals", RFC 4733, DOI 10
.17487 , , <https:///RFC4733 www >..rfc -editor .org /info /rfc4733 - [RFC5245]
-
Rosenberg, J., "Interactive Connectivity Establishment (ICE): A Protocol for Network Address Translator (NAT) Traversal for Offer/Answer Protocols", RFC 5245, DOI 10
.17487 , , <https:///RFC5245 www >..rfc -editor .org /info /rfc5245 - [RFC5506]
-
Johansson, I. and M. Westerlund, "Support for Reduced-Size Real-Time Transport Control Protocol (RTCP): Opportunities and Consequences", RFC 5506, DOI 10
.17487 , , <https:///RFC5506 www >..rfc -editor .org /info /rfc5506 - [RFC5576]
-
Lennox, J., Ott, J., and T. Schierl, "Source-Specific Media Attributes in the Session Description Protocol (SDP)", RFC 5576, DOI 10
.17487 , , <https:///RFC5576 www >..rfc -editor .org /info /rfc5576 - [RFC5763]
-
Fischl, J., Tschofenig, H., and E. Rescorla, "Framework for Establishing a Secure Real-time Transport Protocol (SRTP) Security Context Using Datagram Transport Layer Security (DTLS)", RFC 5763, DOI 10
.17487 , , <https:///RFC5763 www >..rfc -editor .org /info /rfc5763 - [RFC5764]
-
McGrew, D. and E. Rescorla, "Datagram Transport Layer Security (DTLS) Extension to Establish Keys for the Secure Real-time Transport Protocol (SRTP)", RFC 5764, DOI 10
.17487 , , <https:///RFC5764 www >..rfc -editor .org /info /rfc5764 - [RFC6120]
-
Saint-Andre, P., "Extensible Messaging and Presence Protocol (XMPP): Core", RFC 6120, DOI 10
.17487 , , <https:///RFC6120 www >..rfc -editor .org /info /rfc6120 - [RFC6464]
-
Lennox, J., Ed., Ivov, E., and E. Marocco, "A Real-time Transport Protocol (RTP) Header Extension for Client-to-Mixer Audio Level Indication", RFC 6464, DOI 10
.17487 , , <https:///RFC6464 www >..rfc -editor .org /info /rfc6464 - [RFC8828]
-
Uberti, J. and G. Shieh, "WebRTC IP Address Handling Requirements", RFC 8828, DOI 10
.17487 , , <https:///RFC8828 www >..rfc -editor .org /info /rfc8828 - [SDP4WebRTC]
-
Nandakumar, S. and C. Jennings, "Annotated Example SDP for WebRTC", Work in Progress, Internet-Draft, draft
-ietf , , <https://-rtcweb -sdp -14 tools >..ietf .org /html /draft -ietf -rtcweb -sdp -14 - [TS26.114]
-
3GPP, "3rd Generation Partnership Project; Technical Specification Group Services and System Aspects; IP Multimedia Subsystem (IMS); Multimedia Telephony; Media handling and interaction (Release 16)", 3GPP TS 26.114 V16.3.0, , <https://
www >..3gpp .org /Dyna Report /26114 .htm - [W3C.webrtc]
-
Jennings, C., Ed., Boström, H., Ed., and J. Bruaroey, Ed., "WebRTC 1.0: Real-time Communication Between Browsers", World Wide Web Consortium PR PR
-webrtc , , <https://-20201215 www >..w3 .org /TR /2020 /PR -webrtc -20201215 /
Appendix A. SDP ABNF Syntax
For the syntax validation performed in Section 5.8, the following list of ABNF definitions is used:¶
Acknowledgements
Harald Alvestrand, Taylor Brandstetter, Suhas Nandakumar, and Peter Thatcher provided significant text for this document. Bernard Aboba, Adam Bergkvist, Jan-Ivar Bruaroey, Dan Burnett, Ben Campbell, Alissa Cooper, Richard Ejzak, Stefan Håkansson, Ted Hardie, Christer Holmberg, Andrew Hutton, Randell Jesup, Matthew Kaufman, Anant Narayanan, Adam Roach, Robert Sparks, Neil Stratford, Martin Thomson, Sean Turner, and Magnus Westerlund all provided valuable feedback on this document.¶