SDP协议学习

不想看了.

SDP（rfc-4566）原文件阅读

rfc-4566， SDP(Session Description Protocol 会话描述协议) 协议原文件解读。由于是第一次阅读RFC，所以会比较详尽，以后的RFC文档只抓关键，其他的需要时查阅。

参考：

https://www.rfc-editor.org/info/rfc4566

中文的翻译部分参考:

https://blog.csdn.net/jisuanji111111/article/details/120956930

https://zhuanlan.zhihu.com/p/429477119

这里我只贴出英文和必要的和一些没有翻译的中文，全中文查阅上面的链接。

SDP is intended for describing multimedia sessions for the purposes of session announcement, session invitation, and other forms of multimedia session initiation.

1.Introduction

场景：When initiating multimedia teleconferences, voice-over-IP calls, streaming video, or other sessions,

需求：there is a requirement to convey media details, transport addresses, and other session description metadata to the participants.

简单描述： SDP provides a standard representation for such information, irrespective of how that information is transported. SDP is purely a format for session description – it does not incorporate a transport protocol, and it is intended to use different transport protocols as appropriate（单纯的会话描述格式，不包括传输协议，灵活运用）。It is not intended to support negotiation of session content or media encodings

2.Glossary of Terms

三个重要术语：会议、会话、会话描述

Conference: A multimedia conference is a set of two or more communicating users along with the software they are using to communicate.
Session: A multimedia session is a set of multimedia senders and receivers and the data streams flowing from senders to receivers. A multimedia conference is an example of a multimedia session.
Session Description: A well-defined format for conveying sufficient information to discover and participate in a multimedia session.

RFC通用关键字：

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119 [3].

3.Examples of SDP Usage

用于SIP协议中的session descriptions信息。
用于部分RTSP服务器和客户端关于parameters for media delivery的沟通
用于携带电子邮件和www的会话信息描述，它没有传输可靠的性质
用作会话目录用于多播会话场景

4.Requirements and Recommendations

目的：Thus far, multicast-based sessions on the Internet have differed from many other forms of conferencing in that anyone receiving the traffic can join the session (unless the session traffic is encrypted). In such an environment, SDP serves two primary purposes. It is a means to communicate the existence of a session, and it is a means to convey sufficient information to enable joining and participating in the session. In a unicast environment, only the latter purpose is likely to be relevant.

SDP session decription包括：

Session name and purpose
Time(s) the session is active
The media comprising the session
Information needed to receive those media (addresses, ports, formats, etc.)

其他可能信息：

Information about the bandwidth to be used by the session
Contact information for the person responsible for the session

SDP用途：In general, SDP must convey sufficient information to enable applications to join a session (with the possible exception of encryption keys) and to announce the resources to be used to any non-participants that may need to know. (This latter feature is primarily useful when SDP is used with a multicast session announcement protocol.)

4.1. Media and Transport Information

SDP应携带的关于媒体和传输的信息

An SDP session description includes the following media information:

The type of media (video, audio, etc.)
The transport protocol (RTP/UDP/IP, H.320, etc.)
The format of the media (H.261 video, MPEG video, etc.)

In addition to media format and transport protocol, SDP conveys address and port details. For an IP multicast session, these comprise:

The multicast group address for media
The transport port for media

This address and port are the destination address and destination port of the multicast stream, whether being sent, received, or both.

For unicast IP sessions, the following are conveyed:

The remote address for media
The remote transport port for media

The semantics of this address and port depend on the media and transport protocol defined. By default, this SHOULD be the remote address and remote port to which data is sent. Some media types may redefine this behaviour, but this is NOT RECOMMENDED since it complicates implementations (including middleboxes that must parse the addresses to open Network Address Translation (NAT) or firewall pinholes).

4.2. Timing Information

时间信息，应该是有界的。

Sessions may be either bounded or unbounded in time. Whether or not they are bounded, they may be only active at specific times. SDP can convey:

An arbitrary list of start and stop times bounding the session
For each bound, repeat times such as “every Wednesday at 10am for one hour”

This timing information is globally consistent, irrespective of local time zone or daylight saving time (see Section 5.9). （关于时间全球一致的：）

4.3. Private Sessions

可以通过加密session description实现私有会话，可以通过private announcement传递加密密钥和加密方案

private sessions are typically conveyed by encrypting the session description during distribution. The details of how encryption is performed are dependent on the mechanism used to convey SDP; mechanisms are currently defined for SDP transported using SAP [14] and SIP [15]

If a session announcement is private, it is possible to use that private announcement to convey encryption keys necessary to decode each of the media in a conference, including enough information to know which encryption scheme is used for each media.

4.4. Obtaining Further Information about a Session

应携带足够的足以参与会话的信息

A session description should convey enough information to decide whether or not to participate in a session. SDP may include additional pointers in the form of Uniform Resource Identifiers(URIs) for more information about the session.

4.5. Categorisation

过滤会话描述，通过属性a。

When many session descriptions are being distributed by SAP, or any other advertisement mechanism, it may be desirable to filter session announcements that are of interest from those that are not. SDP supports a categorisation mechanism for sessions that is capable of being automated (the “a=cat:” attribute; see Section 6).

4.6. Internationalisation

国际化。要求UTF8编码的ISO 10646字符集应用于部分字段，为了压缩可以在其他字段使用其他字符集。

The SDP specification recommends the use of the ISO 10646 character sets in the UTF-8 encoding [5] to allow many different languages to be represented. However, to assist in compact representations, SDP also allows other character sets such as ISO 8859-1 to be used when desired. Internationalisation only applies to free-text fields (session name and background information), and not to SDP as a whole.

5. SDP Specification

规范。一些字符规定，一些格式规定，一些行为规定及其原因

An SDP session description is entirely textual using the ISO 10646 character set in UTF-8 encoding. SDP field names and attribute names use only the US-ASCII subset of UTF-8, but textual fields and attribute values MAY use the full ISO 10646 character set. Field and attribute values that use the full UTF-8 character set are never directly compared, hence there is no requirement for UTF-8 normalisation.
The textual form, as opposed to a binary encoding such as ASN.1 or XDR, was chosen to enhance portability, to enable a variety of transports to be used, and to allow flexible, text-based toolkits to be used to generate and process session descriptions.
However, since SDP may be used in environments where the maximum permissible size of a session description is limited, the encoding is deliberately compact.
Also, since announcements may be transported via very unreliable means or damaged by an intermediate caching server, the encoding was designed with strict order and formatting rules so that most errors would result in malformed session announcements that could be detected easily and discarded.
This also allows rapid discarding of encrypted session announcements for which a receiver does not have the correct key.

SDP会话描述综述。会话层，媒体层，可选项。

An SDP session description consists of a number of lines of text of the form:

1<type>=<value>

where <type> MUST be exactly one case-significant character and <value> is structured text whose format depends on <type>. In general, <value> is either a number of fields delimited by a single space character or a free format string, and is case-significant unless a specific field defines otherwise. Whitespace MUST NOT be used on either side of the “=” sign.

An SDP session description consists of a session-level section followed by zero or more media-level sections. The session-level part starts with a “v=” line and continues to the first media-level section. Each media-level section starts with an “m=” line and continues to the next media-level section or end of the whole session description. In general, session-level values are the default for all media unless overridden by an equivalent media-level value.

Some lines in each description are REQUIRED and some are OPTIONAL, but all MUST appear in exactly the order given here (the fixed order greatly enhances error detection and allows for a simple parser). OPTIONAL items are marked with a “*”.

 1Session description
 2	v= (protocol version)(协议版本)
 3	o= (originator and session identifier)(创建者和会话标识符)
 4	s= (session name)(会话名称)
 5	i=* (session information)(会话信息)
 6	u=* (URI of description)(描述的URI)
 7	e=* (email address)(邮箱地址)
 8	p=* (phone number)(电话)
 9	c=* (connection information -- not required if included in all media)(连接信息 - 如果包		含在媒体信息中，则不需要该字段)
10	b=* (zero or more bandwidth information lines)(零行或多行带宽信息)
11	One or more time descriptions ("t=" and "r=" lines; see below)
12	z=* (time zone adjustments)(时区调整)
13	k=* (encryption key)(加密密钥)
14	a=* (zero or more session attribute lines)(零行或多行会话属性)
15	Zero or more media descriptions
16	
17Time description
18	t= (time the session is active)(会话活跃时间)
19	r=* (zero or more repeat times)(零次或多次会话重复次数)
20	
21Media description, if present
22	m= (media name and transport address)(媒体名称和传输地址)
23	i=* (media title)(媒体标题)
24	c=* (connection information -- optional if included at
25		session level)(连接信息 - 如果包含在会话信息中，则不需要该字段)
26	b=* (zero or more bandwidth information lines)(零行或多行带宽信息)
27	k=* (encryption key)(加密密钥)
28	a=* (zero or more media attribute lines)(零行或多行媒体属性)

The set of type letters is deliberately small and not intended to be extensible – an SDP parser MUST completely ignore any session description that contains a type letter that it does not understand.
a属性用作扩展：The attribute mechanism (“a=” described below) is the primary means for extending SDP and tailoring it to particular applications or media.
Some attributes (the ones listed in Section 6 of this memo) have a defined meaning, but others may be added on an application-, media-, or session-specific basis.
忽略没有规范的内容：An SDP parser MUST ignore any attribute it doesn’t understand.
外部引用：An SDP session description may contain URIs that reference external content in the “u=”, “k=”, and “a=” lines. These URIs may be dereferenced in some cases, making the session description non-self- contained.

所有媒体的默认属性：The connection (“c=") and attribute (“a=") information in the session-level section applies to all the media of that session unless overridden by connection information or an attribute of the same name in the media description. For instance, in the example below, each media behaves as if it were given a “recvonly” attribute.

 1v=0
 2o=jdoe 2890844526 2890842807 IN IP4 10.47.16.5
 3s=SDP Seminar
 4i=A Seminar on the session description protocol
 5u=http://www.example.com/seminars/sdp.pdf
 6[email protected] (Jane Doe)
 7c=IN IP4 224.2.17.12/127
 8t=2873397496 2873404696
 9a=recvonly
10m=audio 49170 RTP/AVP 0
11m=video 51372 RTP/AVP 99
12a=rtpmap:99 h263-1998/90000

文本域的一些特殊字符：Text fields such as the session name and information are octet（八位字节） strings that may contain any octet with the exceptions of 0x00 (Nul), 0x0a (ASCII newline), and 0x0d (ASCII carriage return).
协议结尾：The sequence CRLF (0x0d0a) is used to end a record, although parsers SHOULD be tolerant and also accept records terminated with a single newline character.
如何解析：If the “a=charset” attribute is not present, these octet strings MUST be interpreted as containing ISO-10646 characters in UTF-8 encoding (the presence of the “a=charset” attribute may force some fields to be interpreted differently).
包含域名的使用：A session description can contain domain names in the “o=”, “u=”, “e=”, “c=”, and “a=” lines. Any domain name used in SDP MUST comply with [1], [2].
域名的编码：Internationalised domain names (IDNs) MUST be represented using the ASCII Compatible Encoding (ACE) form defined in [11] and MUST NOT be directly represented in UTF-8 or any other encoding (this requirement is for compatibility with RFC 2327 and other SDP-related standards, which predate the development of internationalised domain names).

5.1. Protocol Version (“v=")

目前只有版本0

1v=0

The “v=” field gives the version of the Session Description Protocol. This memo defines version 0. There is no minor version number.

5.2. Origin (“o=")

会话组织者，会话标识符，版本号

1o=<username> <sess-id> <sess-version> <nettype> <addrtype> <unicast-address>

The “o=” field gives the originator of the session (her username and the address of the user’s host) plus a session identifier and version number:

<username>（不能有空格） is the user’s login on the originating host, or it is “-” if the originating host does not support the concept of user IDs. The <username> MUST NOT contain spaces.
<sess-id> （基于这个字段的所有属性通过工具生成）is a numeric string such that the tuple of <username>, <sess-id>, <nettype>, <addrtype>, and <unicast-address> forms a globally unique identifier for the session. The method of <sess-id> allocation is up to the creating tool, but it has been suggested that a Network Time Protocol (NTP) format timestamp be used to ensure uniqueness [13].
<sess-version>（版本信息，通过工具生成） is a version number for this session description. Its usage is up to the creating tool, so long as（只要） <sess-version> is increased when a modification is made to the session data. Again, it is RECOMMENDED that an NTP format timestamp is used.
<nettype>（网络类型。一般就是IN了） is a text string giving the type of network. Initially “IN” is defined to have the meaning “Internet”, but other values MAY be registered in the future (see Section 8).
<addrtype>（地址类型） is a text string giving the type of the address that follows. Initially “IP4” and “IP6” are defined, but other values MAY be registered in the future (see Section 8).
<unicast-address>（创建会话者的地址，最好使用域名，不要使用内网地址） is the address of the machine from which the session was created. For an address type of IP4, this is either the fully qualified domain name of the machine or the dotted- decimal representation of the IP version 4 address of the machine. For an address type of IP6, this is either the fully qualified domain name of the machine or the compressed textual representation of the IP version 6 address of the machine. For both IP4 and IP6, the fully qualified domain name is the form that SHOULD be given unless this is unavailable, in which case the globally unique address MAY be substituted. A local IP address MUST NOT be used in any context where the SDP description might leave the scope in which the address is meaningful (for example, a local address MUST NOT be included in an application-level referral that might leave the scope).

综述：In general, the “o=” field serves as a globally unique identifier for this version of this session description, and the subfields excepting the version taken together identify the session irrespective of any modifications.

安全考虑可以随意填充username和unicast：For privacy reasons, it is sometimes desirable to obfuscate（混淆） the username and IP address of the session originator. If this is a concern, an arbitrary <username> and private <unicast-address> MAY be chosen to populate（填充） the “o=” field, provided that these are selected in a manner that does not affect the global uniqueness of the field.

5.3. Session Name (“s=")

文字会话名。

1s=<session name>

The “s=” field is the textual session name. There MUST be one and only one “s=” field per session description. The “s=” field MUST NOT be empty and SHOULD contain ISO 10646 characters (but see also the “a=charset” attribute). If a session has no meaningful name, the value “s= " SHOULD be used (i.e., a single space as the session name).

5.4. Session Information (“i=")

会话的一些描述信息。用作媒体域时是分类标签各个媒体

1i=<session description>

作为会话的描述信息：The “i=” field provides textual information about the session. There MUST be at most one session-level “i=” field per session description, and at most one “i=” field per media. If the “a=charset” attribute is present, it specifies the character set used in the “i=” field. If the “a=charset” attribute is not present, the “i=” field MUST contain ISO 10646 characters in UTF-8 encoding.

区分同一会话下同一类型的媒体：A single “i=” field MAY also be used for each media definition. In media definitions, “i=” fields are primarily intended for labelling media streams. As such, they are most likely to be useful when a single session has more than one distinct media stream of the same media type. An example would be two different whiteboards, one for slides and one for feedback and questions.

The “i=” field is intended to provide a free-form human-readable description of the session or the purpose of a media stream. It is not suitable for parsing by automata（自动机）.

5.5. URI (“u=")

关于会话其他信息的uri。

1u=<uri>

A URI is a Uniform Resource Identifier as used by WWW clients [7]. The URI should be a pointer to additional information about the session. This field is OPTIONAL, but if it is present it MUST be specified before the first media field. No more than one URI field is allowed per session description.

5.6. Email Address and Phone Number (“e=” and “p=")

会议负责人的联系方式。

1e=<email-address>
2p=<phone-number>

The “e=” and “p=” lines specify contact information for the person responsible for the conference. This is not necessarily the same person that created the conference announcement.

Inclusion of an email address or phone number is OPTIONAL. Note that the previous version of SDP specified that either an email field or a phone field MUST be specified, but this was widely ignored. The change brings the specification into line with common usage.

If an email address or phone number is present, it MUST be specified before the first media field. More than one email or phone field can be given for a session description.

Phone numbers SHOULD be given in the form of an international public telecommunication number (see ITU-T Recommendation E.164) preceded by a “+”. Spaces and hyphens may be used to split up a phone field to aid readability if desired. For example:

1p=+1 617 555-6011

Both email addresses and phone numbers can have an OPTIONAL free text string associated with them, normally giving the name of the person who may be contacted. This MUST be enclosed in parentheses if it is present. For example:

1[email protected] (Jane Doe)

The alternative RFC 2822 [29] name quoting convention is also allowed for both email addresses and phone numbers. For example:

1e=Jane Doe <[email protected]>

The free text string SHOULD be in the ISO-10646 character set with UTF-8 encoding, or alternatively in ISO-8859-1 or other encodings if the appropriate session-level “a=charset” attribute is set.

5.7. Connection Data (“c=")

连接数据。

1c=<nettype> <addrtype> <connection-address>

The “c=” field contains connection data.

A session description MUST contain either at least one “c=” field in each media description or a single “c=” field at the session level. It MAY contain a single session-level “c=” field and additional “c=” field(s) per media description, in which case the per-media values override the session-level settings for the respective media.

The first sub-field ("<nettype>") is the network type, which is a text string giving the type of network. Initially, “IN” is defined to have the meaning “Internet”, but other values MAY be registered in the future (see Section 8).

The second sub-field ("<addrtype>") is the address type. This allows SDP to be used for sessions that are not IP based. This memo only defines IP4 and IP6, but other values MAY be registered in the future (see Section 8).

The third sub-field ("<connection-address>") is the connection address. OPTIONAL sub-fields MAY be added after the connection address depending on the value of the <addrtype> field.

When the <addrtype> is IP4 and IP6, the connection address is defined as follows:

多播场景：If the session is multicast, the connection address will be an IP multicast group address. If the session is not multicast, then the connection address contains the unicast IP address of the expected data source or data relay or data sink as determined by additional attribute fields. It is not expected that unicast addresses will be given in a session description that is communicated by a multicast announcement, though this is not prohibited.
Sessions using an IPv4 multicast connection address MUST also have a time to live (TTL) value present in addition to the multicast address. The TTL and the address together define the scope with which multicast packets sent in this conference will be sent. TTL values MUST be in the range 0-255. Although the TTL MUST be specified, its use to scope multicast traffic is deprecated（不赞同使用TTL限制多播流量）; applications SHOULD use an administratively scoped address instead.

The TTL for the session is appended to the address using a slash as a separator. An example is:

1c=IN IP4 224.2.36.42/127

IPv6 multicast does not use TTL scoping, and hence the TTL value MUST NOT be present for IPv6 multicast. It is expected that IPv6 scoped addresses will be used to limit the scope of conferences.