Broadsoft and Cisco SIP 503 “Retry-After” SIP Header Support
SIP 503 Service Unavailable is commonly seen in a VoIP network when a SIP device (such as a SIP server) is knowingly unable to process a call. Typically when this happens the endpoint that originated the Invite will try the next available host it receives in the SIP Contact header. In this particular example lets assume a call comes in from the PSTN and a SIP Invite is originated from the Cisco AS5800 media gateway to the BroadWorks Network Server. The NS looks at the called number and identifies it is a Subscriber that lives on the BroadWorks Application Server cluster. The NS returns a 302 Move Temp message back to the Cisco AS5800 and includes the SIP URI’s for both the primary AS and the secondary AS in the Contact header. If the Broadworks Extreme Overload Controls are configured on the Application Servers and the primary AS is peaking at 90% CPU utilization the server identifies itself to be in the “red zone”. This will cause the primary AS to respond to any Invites it receives with a 503 Service Unavailable message.
The following diagram shows the call flow for a standard SIP 503 Service Unavailable response (without Retry-After header support). This call flow will repeat for every new call coming from the AS5800 towards Broadworks.
Although this is not necessarily the best way to handle an outage, using 503 Service Unavailable is the most widely supported method with the majority of SIP vendors and it does typically work as designed (prevent a total outage). The downside of this model is the 503 message is only relevant within the SIP dialog of that one particular call. In our case with the primary AS gracefully rejecting calls using 503 Service Unavailable, each time a new call comes in from the PSTN a new Invite is sent from the AS5800 but it will always attempt to send the Invite to the primary AS. This means that as long as the primary AS is in the red zone and the AS5800 keeps sending new Invites to the primary AS first, a 503 Service Unavailable must be sent the AS5800 before it tries the secondary AS. This is not the most desirable behavior. At some point the primary AS may be so overloaded it stops sending any SIP messages which causes a whole new set of service-impacting issues.
Broadsoft went a step further and provided support for the SIP 503 Retry-After header. What this means is when the primary AS returns a 503 message to the Cisco AS5800 it will tell it to stop sending traffic to the server for a period of time. If the Retry-After header contains a time duration of 60 seconds, the AS5800 would not send any requests to the primary AS for 60 seconds, effectively putting that BroadWorks node in an out-of-service state. This is a much more graceful approach to congestion management. One caveat is that not all SIP devices support the Retry-After header.
The Application Server must be configured with the addresses of the SIP devices that support the Retry-After header. Only these devices will receive this header in a 503 message. Any other device where BroadWorks returns a 503 message will not receive the Retry-After header and will receive the normal 503 Service Unavailable message.
AS_CLI/System/OverloadControls/ManagedNeighbors/Capabilities> get
Neighbor Address Retry After Receiver Retry After Sender Description
=====================================================
67.210.22.19 true true Cisco_AS5800
The purpose of min and max is for the AS to randomly select a timer value between the two ranges when populated the duration in the Retry-After header. This prevents the same timer value from being sent to multiple SIP devices which could later flood the AS all at the same time.
AS_CLI/System/OverloadControls> get
minRetryAfterInSeconds = 60
maxRetryAfterInSeconds = 120
neighborMaxServiceUnavailablePeriodInSeconds = 120
AS_CLI/System/OverloadControls> set supportCongestionManagement true
Page 1 of 3 | Next page
