THE ORIGIN AND PURPOSE OF SIP

The concept of a ‘session’ was ﬁrst introduced in RFC 2327 (the Session Description Protocol) as a set of data streams carrying multiple types of media between senders and receivers. A session can be a phone call, a video conference, a user taking remote control of a PC, or two users sharing data, chatting, or exchanging instant messages.

The Session Initiation Protocol (SIP) was originally deﬁned in RFC 2543 by the MMU- SIC (Multiparty Multimedia Session Control) working group of the IETF. The MMUSIC working group focused on loosely coupled conferences as they existed on the MBONE (see the companion book,Beyond VoIP Protocols, Chapter 6 for additional details on the MBONE) and was working on a complete multimedia framework based on the following protocols:

• The Session Description Protocol (SDP, RFC 2327) and the Session Announcement Protocol (SAP, RFC 2974).

• The Real-Time Stream Protocol (RTSP, RFC 2326) to control real-time, or more pre- cisely isochronous,1data servers.

• SIP.

These protocols complement existing IETF protocols, such as RTP (RFC 1889) from the AVT working group (Audio/Video Transport), used for the transfer of isochronous

1The data elements of an isochronous data stream, for instance, voice samples, must be played back with the same relative time intervals as when they were recorded.

IP Telephony: Deploying VoIP Protocols and IMS Infrastructure, Second Edition O. Hersent

2011 John Wiley & Sons, Ltd

data, or RSVP from the INTSERV (INTegrated SERVices) working group for bandwidth allocation.

SIP now has its own working group within the IETF, which maintains close coordination with the MMUSIC group, as the latter is still working on improving the SDP which is used extensively in SIP.

One of the initial goals of SIP was to remain simple, and to this purpose, ‘classic’

telecom protocol design principles, such as protocol layers isolation or complete sep- aration of functional blocks (e.g., message syntax, message encoding and serialization, retransmission), were initially left behind as unnecessary heaviness. The initial SIP RFC aimed at deﬁning in a single 150 pages document all the technical details required for session management, covering message reliability, transport, security, and a set of generic primitives for the following functions:

• User location: determination of the technical parameters (IP address, etc.) required to reach an end system to be used for communication and association of end users with end systems.

• User availability: determination of the reachability of an end user and the willingness of the called party to communicate.

• Endpoint capabilities: determination of the media types, media parameters, and end system functions that can be used.

• Session setup: ‘ringing’ a remote device, establishment of media session parameters at both the called and calling parties.

• Session management: including transfer and termination of sessions, modifying session parameters, and invoking services. The scope of SIP has been restricted to loose multiparty conferences, i.e., functions such as chair control are out of the scope of the current SIP speciﬁcation. These conference control functions are left to extensions that can be carried within SIP messages.

It took just about a year for SIP to become surprisingly popular for a telecom protocol, but this can be understood from the context. Just like its contemporaries WAP or UMTS, the development of the SIP occurred at the peak of the Internet bubble, and many start- up companies spent an inordinate amount of marketing resources to promote SIP to omnipotent status. Just as the ‘new economy’ was being praised as a simple new paradigm vastly superior to the ‘old economy’, burdened by obsolete conventions and processes, the word began to spread that SIP was a new simple way of designing telecom systems and that the old public network was unnecessarily complex and inefﬁcient. Even the H.323 protocol, only a couple of years older than SIP, was caught in this wave, and began to be criticized for its heaviness and traditional telecom heritage.2

2Indeed, H.323 is based on the Q.931 protocol used in current telecom networks, and uses the most recent software modelling tool, the Speciﬁcation and Description Language—SDL—capable of automatic test case generation. H.323 deﬁnes and separates many functional software modules, and uses an abstract syntax (ASN.1) to describe its messages, which allows to automatically generate parsing and serialization code.

After the explosion of the Internet bubble, the marketing clouds slowly began to dis- sipate, and after a few years of experience, the real strengths and weaknesses of SIP are now easier to assess. One strength of the protocol is that the authors constantly tried to abstract it from any speciﬁc use. For instance, most of the time, the SIP primitives were used to carry ‘opaque’ objects required for a speciﬁc application or media and not understood by the SIP protocol stack.3 This did stimulate the imagination of develop- ers, and led to interesting ideas—for instance the use of SIP for IM (see Section 3.5 of this chapter).

The simplicity of Figure 3.1 also explains much of the initial enthusiasm for SIP.4 From this simple example we can see that SIP is very efﬁcient: the callee to caller media channel can be setup in exactly one round trip and the caller to callee media channel can be setup in one and a half round trips. This was much better than the many round trips that were required by the bootstrap nature of H.323v1. A similar call ﬂow, call fast connect, was only introduced in H.323v2 (see Section 2.3 in Chapter 2).

Unfortunately, the weaknesses of the protocol were also many, and the SIP com- munity had to work hard to solve or improve them. Today, only the IMS proﬁle of

INVITE

john@192.190.132.31 c=IN IP4 192.190.132.20 m=audio 49170 RTP/AVP 0

200 OK

c=IN IP4 192.190.132.31 m=audio 12345 RTP/AVP 3

John’s terminal rings

ACK

192.190.132.31 John In this (over) simplified INVITE request,

Mary’s terminal says that it can receive àlaw PCM data (RTP/AVP 0) at

192.190.132.20 on port 49170 Mary

The media can be sent immediately after receiving

the INVITE request (for instance ringback tones)

Port 49170

Port 12345

The response indicates that John’s terminal can receive GSM data on port 12345 GSM

192.190.132.20

àlaw

Figure 3.1 Simple phone call scenario, as per the original SIP RFC.

3This ability to transport opaque parameters is also present in most other protocols, notably H.323 using the ‘Non-Standard Parameters’ that can be freely deﬁned within the framework of the standard.

Note also that in SIP the size of opaque parameters is restricted by the fact that no segmentation mechanism has been deﬁned for SIP over UDP.

4This ﬁgure does not use the offer-answer model introduced by RFC3261; see Section 3.2.2.7.2.

the protocol has really solved most of these issues, which still affect some ‘plain SIP’

implementations.

• Because SIP ‘can potentially’ be expanded, it is often believed and touted that SIP

‘does’ everything. This is the well known ‘it’s just software’ syndrome. Year after year there has been an accumulation of proprietary extensions of SIP—sometimes described in draft documents, while sometimes not even documented—but the lack of a well- defined standardization process has often prevented convergence of implementations to occur. The reality, despite claims to the contrary in ‘sponsored’ interoperability events, is that only the most common call flows work across vendors, and they are often too trivial to fully address the complexity of real-world applications. Too often, SIP is still only a reassuring name hiding many proprietary extensions. As a result, operational SIP networks today are still built mostly with infrastructure equipment provided or integrated by a single vendor. However, the involvement of ETSI 3GPP and ETSI TISPAN working groups greatly improved the situation—these groups introduced significant modifications to the protocol as they defined a SIP profile for use in IMS networks—and the standardization process for this ‘flavour’ of SIP is much more rigorous. As a result, the SIP IMS profile has become a much more robust and interoperable protocol.

• The PSTN appeared to be a lot more complex than originally anticipated, and therefore SIP lacked many of the features required for proper interworking with the PSTN.

H.323v1 had also missed quite a few details, but its Q.931 heritage made it easier to fix the issues quickly in a standard way across vendors. As a result, most VoIP networks interworking with the PSTN initially used H.323, and not SIP.5It took almost 10 years to reduce the number of proprietary SIP extensions required for proper interworking, and finally there is, with the TISPAN profile of SIP, a standard and robust SIP profile for PSTN interworking.

• The increased complexity of the protocol required by the PSTN interworking and other fixes in the initial RFC has become hard to manage with the original ‘informal all-in- one design’ approach. The ‘old’ way of layering protocols and defining clean functional modules aimed at managing complexity and ensuring consistent quality as the software evolves. The latest SIP specifications clearly head back to this modular approach, but the original design and the lack of formal methodology make this very difficult, and the latest RFCs are still burdened with exceptions and shortcuts between software layers that make the protocol difficult to implement and test. SIP is certainly not ‘simple’

any more.

5In November 2002, the VASA consortium (BellSouth, Chunghwa Telecom, Equant, France Tele- com, SBC, Sprint PCS, Telecom Italia Lab, VeriSign, Verizon, and WorldCom) published an independent study of ‘SIP in Carriers Networks’ which emphasized that ‘some network operators have experienced significant difficulties in interworking different vendors’ products. In contrast with the initial objectives of SIP, operators are driven towards single vendor solutions’. The study concluded: ‘For existing networks, the arguments against immediate migration from TDM or H.323 to SIP outweigh the potential benefits’.

This section will describe the most common PSTN interworking scenarios, which work without extensions of SIP, and will list the major cases where extensions are still required.

When available, the documented extensions of major SIP vendors will be discussed.

One of the applications that has emerged out of the multiple theoretical possibilities of the protocol is Presence and Instant Messaging (IM). The adoption of SIP for IM by Microsoft made it a serious option, as an alternative to the only other open standard for IM : Jabber.

The SIP applications for Presence and IM are described in Section 3.5 of this chapter.

3.1.1 From RFC 2543 to RFC 3261

SIP remained a draft document for a long time before it was ﬁnally published as an RFC in March 1999 (RFC 2543). The ﬁrst published version of the protocol was SIP 2.0.

Unfortunately, this first version of the RFC was trying to embrace too much, contained many errors, and was too vague and ambiguous to be a real specification document. It was more a sort of technical brainstorming document and was taken as such by the many start-up companies that began to implement SIP products. All the first trial SIP networks used their ‘flavour’ of SIP, with their own corrections and expansions to the original SIP specification, and used only the simplest call flows defined by the RFC.

As the first useful feedback was gathered from these trials, the RFC was updated with nine ‘bis’ versions, and finally all changes were merged in June 2002 in a new RFC, RFC 3261. Important aspects of the initial specification were split into separate RFCs.

Although RFC 3261 does not update the SIP version number, which remains SIP 2.0, it does not only correct errors and clariﬁes ambiguities but also really makes major changes to RFC 2543. The protocol is now more robust and more clearly documented, although the RFC is still a bit verbose and vague, with expressions like ‘modest level of backwards compatibility’ or ‘almost identical’ that can be misleading. RFC 3261 is in reality a major new version of SIP, and is not backward compatible with RFC 2543, although most simple call ﬂows will work across the two RFC versions.

In the process, the SIP protocol lost the apparent simplicity of its early days, and the size of the main RFC nearly doubled to 270 pages. The new RFC is an umbrella document that points to other RFCs for speciﬁc details or applications; the complete documentation (see Figure 3.2) includes hundreds of additional pages. Among the most important documents are the following:

• RFC 3262: Reliability of Provisional Responses in Session Initiation Protocol (SIP).

This RFC is required in all cases where SIP needs to interwork with a telephone network.

• RFC 3263: Session Initiation Protocol (SIP): Locating SIP Servers. The location of SIP servers is really an independent module in a SIP implementation, and is now documented separately from the main SIP RFC.

• RFC 3264: An Offer/Answer Model with Session Description Protocol (SDP).

This was one of the most necessary clariﬁcations of the original SIP RFC, where the exact use of the SDP syntax was ambiguous and led most vendors to implement

A Darwinian view of voice transport

The ‘hello world case’: simple voice call from