SIP 503 Service Unavailable is commonly seen in a VoIP network when a SIP device (such as a SIP server) is knowingly unable to process a call. Typically when this happens the endpoint that originated the Invite will try the next available host it receives in the SIP Contact header. In this particular example lets assume a call comes in from the PSTN and a SIP Invite is originated from the Cisco AS5800 media gateway to the BroadWorks Network Server. The NS looks at the called number and identifies it is a Subscriber that lives on the BroadWorks Application Server cluster. The NS returns a 302 Move Temp message back to the Cisco AS5800 and includes the SIP URI’s for both the primary AS and the secondary AS in the Contact header. If the Broadworks Extreme Overload Controls are configured on the Application Servers and the primary AS is peaking at 90% CPU utilization the server identifies itself to be in the “red zone”. This will cause the primary AS to respond to any Invites it receives with a 503 Service Unavailable message.
The following diagram shows the call flow for a standard SIP 503 Service Unavailable response (without Retry-After header support). This call flow will repeat for every new call coming from the AS5800 towards Broadworks.
Although this is not necessarily the best way to handle an outage, using 503 Service Unavailable is the most widely supported method with the majority of SIP vendors and it does typically work as designed (prevent a total outage). The downside of this model is the 503 message is only relevant within the SIP dialog of that one particular call. In our case with the primary AS gracefully rejecting calls using 503 Service Unavailable, each time a new call comes in from the PSTN a new Invite is sent from the AS5800 but it will always attempt to send the Invite to the primary AS. This means that as long as the primary AS is in the red zone and the AS5800 keeps sending new Invites to the primary AS first, a 503 Service Unavailable must be sent the AS5800 before it tries the secondary AS. This is not the most desirable behavior. At some point the primary AS may be so overloaded it stops sending any SIP messages which causes a whole new set of service-impacting issues.
Broadsoft went a step further and provided support for the SIP 503 Retry-After header. What this means is when the primary AS returns a 503 message to the Cisco AS5800 it will tell it to stop sending traffic to the server for a period of time. If the Retry-After header contains a time duration of 60 seconds, the AS5800 would not send any requests to the primary AS for 60 seconds, effectively putting that BroadWorks node in an out-of-service state. This is a much more graceful approach to congestion management. One caveat is that not all SIP devices support the Retry-After header.
The Application Server must be configured with the addresses of the SIP devices that support the Retry-After header. Only these devices will receive this header in a 503 message. Any other device where BroadWorks returns a 503 message will not receive the Retry-After header and will receive the normal 503 Service Unavailable message.
Neighbor Address Retry After Receiver Retry After Sender Description
184.108.40.206 true true Cisco_AS5800
The purpose of min and max is for the AS to randomly select a timer value between the two ranges when populated the duration in the Retry-After header. This prevents the same timer value from being sent to multiple SIP devices which could later flood the AS all at the same time.
minRetryAfterInSeconds = 60
maxRetryAfterInSeconds = 120
neighborMaxServiceUnavailablePeriodInSeconds = 120
AS_CLI/System/OverloadControls> set supportCongestionManagement true
All Hosting and Routing NE’s under NS_CLI/System/OverloadControls/ManageNeighbors/Capabilities need to be manually set to true/false for Retry-After header support. This is for 503 messages originated by Network Server in the event the NS is unable to process SIP Invites.
Neighbor Net Address Retry-After
The timers must be set on the NS. The same rules apply as the AS where a random duration between the two values is populated in the Retry-After header to avoid a flood of SIP messages arriving all at the same time when the timer duration expires.
minRetryAfterInSeconds = 30
maxRetryAfterInSeconds = 90
The final step is to verify Congestion Management is enabled on both the AS and NS servers.
AS_CLI/System/OverloadControls> set supportCongestionManagement true
NS_CLI/System/OverloadControls> set supportCongestionManagement true
There are some additional parameters that can be modified in the OverloadControls section such as memoryInUse, allowEmergencyCallsInOverload, trafficSamplingPeriodInSeconds, and more. It is critical to review and understand what all of these values mean in order to have the most effective overload configuration for the VoIP network.
The final call flow with Retry-After header parameters statically configured for the Cisco AS5800 device in BroadWorks BWCLI. on thgateway would look like this.
SIP Invites will only be sent to the Secondary AS for a random duration of time between 60 – 120 seconds as chosen by the Primary AS (60 seconds in this example)
The following are considered valid formats for the Retry-After header.
The Cisco AS5800 also has the ability to respond with a Retry-After header. This would come in handy if TDM circuits to the PSTN are down and the AS5800 responds to Invites with a 503 Service Unavailable message. At least one other egress point in the network needs to exist so the call will complete. When a BroadWorks Subscriber dials digits destined towards the PSTN, the AS queries the NS and the NS responds with a 302 Move Temp to the AS and provides more than one address (media gateway) in the Contact header. The AS sends an Invite to the first address in the Contact header (AS5800) but if it receives a 503 it tries the next address in the Contact header. The NS may be configured to supply up to five addresses in the Contact header.
router# config t
router(config)# voice class sip-profiles 10
router(config-class)# response 503 sip-header Retry-After add “Retry-After: 60″
router(config)# voice service voip
router(conf-serv-sip)# sip-profiles 10
This will add the Retry-After header to all SIP 503 responses originated by the router. If it is not desirable for every 503 response to include a Retry-After header then remove sip-profiles 10 from voice service voip and alternatively add the profile to one or more SIP dial-peers.
If there is an Acme Packet Session Border Controller residing in the network it will honor the retry-after header received by any Sessions Agent. For example, if the BroadWorks Network Servers (NS1 and NS2) are configured as Session Agents and one of them responds to Invite with a retry-after timer of 3600 seconds, the SBC will take that Session Agent “out of service” for one hour and therefore no SIP traffic will be sent to that particular server. After one hour has passed the SBC will begin sending traffic to the server again.