1 What is the problem with SIP retransmits?
2 -----------------------------------------
4 Sometimes you get messages in the console like these:
6 - "retrans_pkt: Hanging up call XX77yy - no reply to our critical packet."
7 - "retrans_pkt: Cancelling retransmit of OPTIONs"
9 The SIP protocol is based on requests and replies. Both sides send
10 requests and wait for replies. Some of these requests are important.
11 In a TCP/IP network many things can happen with IP packets. Firewalls,
12 NAT devices, Session Border Controllers and SIP Proxys are in the
13 signalling path and they will affect the call.
15 SIP Call setup - INVITE-200 OK - ACK
16 ------------------------------------
17 To set up a SIP call, there's an INVITE transaction. The SIP software that
18 initiates the call sends an INVITE, then wait to get a reply. When a
19 reply arrives, the caller sends an ACK. This is a three-way handshake
20 that is in place since a phone can ring for a very long time and
21 the protocol needs to make sure that all devices are still on line
22 when call setup is done and media starts to flow.
24 - The first reply we're waiting for is often a "100 trying".
25 This message means that some type of SIP server has received our
26 request and makes sure that we will get a reply. It could be
27 the other endpoint, but it could also be a SIP proxy or SBC
28 that handles the request on our behalf.
30 - After that, you often see a response in the 18x class, like
31 "180 ringing" or "183 Session Progress". This typically means that our
32 request has reached at least one endpoint and something
33 is alerting the other end that there's a call coming in.
35 - Finally, the other side answers and we get a positive reply,
36 "200 OK". This is a positive answer. In that message, we get an
37 address that goes directly to the device that answers. Remember,
38 there could be multiple phones ringing. The address is specified
39 by the Contact: header.
41 - To confirm that we can reach the phone that answered our call,
42 we now send an ACK to the Contact: address. If this ACK doesn't
43 reach the phone, the call fails. If we can't send an ACK, we
44 can't send anything else, not even a proper hangup. Call
45 signalling will simply fail for the rest of the call and there's
46 no point in keeping it alive.
48 - If we get an error response to our INVITE, like "Busy" or
49 "Rejected", we send the ACK to the same address as we sent the
50 INVITE, to confirm that we got the response.
52 In order to make sure that the whole call setup sequence works and that
53 we have a call, a SIP client retransmits messages if there's too much
54 delay between request and expected response. We retransmit a number of
55 times while waiting for the first response. We retransmit the answer to an
56 incoming INVITE while waiting for an ACK. If we get multiple answers,
57 we send an ACK to each of them.
59 If we don't get the ACK or don't get an answer to our INVITE,
60 even after retransmissions, we will hangup the call with the first
61 error message you see above.
65 Other SIP requests are only based on request - reply. There's
66 no ACK, no three-way handshake. In Asterisk we mark some of
67 these as CRITICAL - they need to go through for the call to
68 work as expected. Some are non-critical, we don't really care
69 what happens with them, the call will go on happily regardless.
71 The qualification process - OPTIONS
72 -----------------------------------
73 If you turn on qualify= in sip.conf for a device, Asterisk will
74 send an OPTIONS request every minute to the device and check
75 if it replies. Each OPTIONS request is retransmitted a number
76 of times (to handle packet loss) and if we get no reply, the
77 device is considered unreachable. From that moment, we will
78 send a new OPTIONS request (with retransmits) every tenth
84 For some reason signalling doesn't work as expected between
85 your Asterisk server and the other device. There could be many reasons
88 - A NAT device in the signalling path
89 A misconfigured NAT device is in the signalling path
90 and stops SIP messages.
91 - A firewall that blocks messages or reroutes them wrongly
92 in an attempt to assist in a too clever way.
93 - A SIP middlebox (SBC) that rewrites contact: headers
94 so that we can't reach the other side with our reply
96 - A badly configured SIP proxy that forgets to add
97 record-route headers to make sure that signalling works.
98 - Packet loss. IP and UDP are unreliable transports. If
99 you loose too many packets the retransmits doesn't help
100 and communication is impossible. If this happens with
101 signalling, media would be unusable anyway.
106 Turn on SIP debug, try to understand the signalling that happens
107 and see if you're missing the reply to the INVITE or if the
108 ACK gets lost. When you know what happens, you've taken the
109 first step to track down the problem. See the list above and
110 investigate your network.
112 For NAT and Firewall problems, there are many documents
113 to help you. Start with reading sip.conf.sample that is
114 part of your Asterisk distribution.
116 The SIP signalling standard, including retransmissions
117 and timers for these, is well documented in the IETF
120 Good luck sorting out your SIP issues!
125 -- oej (at) edvina.net, Sweden, 2008-07-22
126 -- http://www.voip-forum.com