SMS MT Latency issue to GB
Incident Report for Vonage API
Postmortem

Incident Statement - UK SMS Delay

During this time, some SMS delivered to the United Kingdom experienced increased latency.

Date/Time Impacted

2022/06/17 10:12:58 UTC to 2022/06/20 16:08:00 UTC

Services Impacted

SMS MT to United Kingdom

Summary

On 2022/06/17 a routine update was made to our Carrier Lookup Service (CLS) (used to identify the operator of a phone number) that is used for the United Kingdom routing. Everything tested fine after the change.  Subsequently, an issue developed with this supplier that caused lookups to timeout after 20 seconds of no response from our supplier on a subset of UK mobile numbers. This resulted in delays of ~20-23 seconds of extra delivery delay on these MT to UK mobile numbers.

The issue was detected on 2022/06/20 13:19:00 UTC via customer notification of delay of SMS being delivered and we discovered our alerting mechanisms did not pick up the increased latency and flag to our technical teams as it was designed to. This was raised as an internal incident and investigated further.

At 16:08:00 UTC a fix was put in place to resolve it and mitigate all customer impact.

Root Cause

There was unexpected behaviour in our CLS supplier.  This resulted in a number of  requests that  timed out before failing over to our internal numbering plan. As a result, our system was delayed for 20 seconds before sending out the SMS as expected.

Upon further investigation we did not have the proper alerting in place for reasonable increased latency alerts.

Restoration Actions

At 16:08:00 UTC we removed the CLS provider that was giving us increased latency and moved back to a different provider to restore service.

Next Steps

  1. Change our alerting and monitoring for CLS service so that anything outside 1s at the 95th percentile will trigger an alert.
  2. Review of all of our CLS providers for latency anomalies.
  3. Reviewing our testing process following CLS supplier changes.
  4. Reduce the failover for CLS providers to 5 seconds.
Posted Jun 24, 2022 - 09:20 UTC

Resolved
The SMS MT latency problem has been resolved.

The services were impacted from 2022-06-17 10:12 UTC until 2022-06-20 17:15 UTC.

All services have been restored.
Please let support@api.vonage.com know if you continue to experience problems with this.
A post-mortem will be published in the next few days.
Posted Jun 21, 2022 - 16:58 UTC
Monitoring
We have implemented a fix for the SMS MT Latency issue in GB.

We will continue to monitor the service during the next few hours and post updates should anything change.
Posted Jun 20, 2022 - 16:57 UTC
Investigating
We are currently investigating an issue impacting SMS delivery delay to the United Kingdom (GB).

You may have seen increased latency behaviour when sending SMS to GB.

The following services were impacted from 2022-06-17 until 2022-06-20 17:15 UTC.

SMS MT to GB

We will update this status as soon as we have more information on this issue.
Posted Jun 20, 2022 - 16:28 UTC
This incident affected: SMS API (Europe traffic).