IP Corp > News > news > Microsoft Teams Useless SIP Options or the curious case of delayed SBC failover
  • George Goglidze
  • No Comments

by George Goglidze, CCIE #19926

Microsoft Teams Useless SIP Options or the curious case of delayed SBC failover

I had a support request the other day, asking to see why the failover to the second SBC was taking over 20 seconds. This had me scratching my head for a little bit and digging some more information on how failover works on Microsoft Teams.

So I went to Microsoft site, to do some research on MS Teams Direct Routing failover timers. And here is what I found.

Reference: https://docs.microsoft.com/en-us/microsoftteams/direct-routing-trunk-failover-on-outbound-call

Failover can occur on application level when one of the “Failover response codes” are received from the SBC. By default, they are:

  • 408 – Request Time-out
  • 503 – Service Unavailable
  • 504 – Server Time-out

If any of these are received, the failover would be instantaneous. But of course, it was not. So, what else could cause the failover.

The second configuration item that can cause the failover would be the Direct Routing SBC configuration parameter “Failover time (seconds)”. This by default is set to 10 seconds. Unfortunately, Microsoft does not have any documentation saying what does failover time mean exactly.

Is this failover timer until the TLS/TCP connection is established, or if the Trying message is received and caused generally by SIP messages being delayed in the network? None or this is documented, and the time of failover was over 20 seconds, well beyond 10 seconds configured, therefore I thought I’ll keet looking.

The next place to look was the SIP Options. So generally, the SIP Options is enabled and it should detect if the SBC is down. SIP Options is basically a ping that verifies connectivity to the SBC. I thought ok, if the communication is down and we are not getting 200OK on SIP Options, then the failover should be instantaneous. That would be the logical outcome that any sane engineer would have thought right? Well yes if you have common sense 🙂

But I thought I’ll verify it to be sure. So I’ve half broken my MS Teams/Cisco UCM lab. I did the following:

I’ve set wrong IP Address on the DNS for my SBC, so sbc1.ccie.club now translates to a wrong IP, therefore Microsoft cannot communicate with it. But my SBC can still communicate with it. So we have a weird situation where calls one way (SBC to MS Teams) works but not the other way around.

This has resulted in the following status of the SBC on the Teams admin centre:

TLS Connection status reported to be active, but we have an issue on SIP options status.

Here is what the warning is saying:

This basically means that Microsoft Teams has not received any reply to the SIP OPTIONS sent out to the SBC. Logically this would mean that Microsoft Teams would stop trying to send out the INVITES to this SBC right? Apparently wrong.

I’ve had luck to be able to get the SIP Traces from Microsoft, and here is what I got.

Microsoft Teams still attempted to send out 4 INVITEs to the SBC, which results we can clearly see are over 16 seconds. So, this now is coming closer to the 20 second failover timer that the customer was seeing.

Why would Microsoft Teams still use the route if SIP OPTIONS are clearly not working! I do not know, maybe we will get an answer one day. Also, there is no solution to resolve this and bring the timer down. Let’s hope Microsoft resolves this soon.

Wish everyone a good day.

Leave a Reply