Incident Summary: WhatsApp Integration Outage (August 30 - September 4, 2024)
Incident Timeline:
What Happened:
On August 30, 2024, at 16:56 UTC, our systems stopped receiving notifications for new WhatsApp messages, meaning inbound WhatsApp messages for integrated phone numbers would no longer flow into Gorgias. At the same time, our customers began encountering errors when trying to reply in WhatsApp tickets, receiving a message that read: "(#200) You do not have permission to access this field".
The first customer reports started arriving around August 30, 17:30 UTC, and our engineering team began investigating the issue after an alert was triggered at 19:14 UTC because of the missing notifications. Initial checks confirmed that the issue was affecting all customers, leading to a complete outage of our WhatsApp integration. We immediately reached out to Meta’s support team to understand what was causing the problem while continuing our internal investigation.
Investigation and Challenges:
The error message suggesting that we lacked permission to create messages indicated that something might have gone wrong with our app's permissions in Meta’s systems. We checked all available resources, including Meta’s changelogs, status pages, and developer community posts, but found no changes or incidents that could explain the issue. We even posted on Meta’s developer forums to see if other developers were experiencing similar issues.
Our attempts to investigate the problem via Meta’s developer and business platforms yielded no results—there were no warnings, notifications, or indicators suggesting something was wrong with our app or business account.
At 22:08 UTC on August 30, after exhausting initial troubleshooting options, we paused the investigation for the day, waiting for a response from Meta’s support team while continuing to monitor the situation. Over the weekend, we maintained communication with Meta and provided any information requested in an attempt to escalate the issue, but the responses we received did not help us resolve the problem.
Note: during this time, we continued to provide regular updates on our status page. However, due to an issue with our internal tooling, the incident was accidentally duplicated. When we deleted the duplicate on September 2, all messages posted during that period were removed as well. This is why there is a “hole” in the history of updates during that time.
Discovering the Cause:
By September 2, having received no relevant updates from Meta, we revisited our investigation and discovered a potential cause: our app was missing the advanced whatsapp_business_messaging permission, which Meta’s documentation states is required for Cloud Solution Partners. This was the first indication that this permission might be related to the issue, although the consequences of missing it were not clearly explained.
We had never had this advanced permission in the past, only standard access, and our app had worked without issue up until August 30. Despite the uncertainty about whether this was the cause, we submitted a request for the advanced permission on September 2 at 11:04 UTC.
Unfortunately, our request was rejected a few hours later. Since there was no confirmation that this permission was the root cause of the issue, we temporarily set aside the request while continuing our dialogue with Meta.
Escalation and Resolution:
On September 3, at 19:13 UTC, we submitted another request for the advanced whatsapp_business_messaging permission, but this was rejected as well. By September 4, at 08:37 UTC, we submitted yet another request with the additional information Meta required. This time, the request was approved at 10:42 UTC on September 4, and immediately after, inbound notifications resumed, and the errors our customers were experiencing stopped.
Ongoing Efforts:
Following the resolution, we continued to communicate with Meta to understand why the absence of this advanced permission, suddenly caused a complete outage. We also asked Meta if it would be possible to recover the messages that were missed during the outage by replaying the related notifications, or if there were any other options for the recovery. Meta initially responded that this was not possible.
We followed up with another request, asking for further details as to why Meta’s systems seem to lack the ability to replay these notifications, but we have not yet received a response.
As of September 12, 2024, we are waiting for further clarification from Meta.
Impact on Customers:
Actions Taken:
Next Steps: We are continuing to engage with Meta to understand why this permission became necessary unexpectedly and why there were no clear warnings in Meta’s systems about its importance. We are also investigating ways to prevent similar incidents in the future by improving our monitoring systems and ensuring quicker escalation paths. Additionally, we will continue seeking ways to recover the missed WhatsApp messages.
We apologize for the inconvenience this incident has caused and appreciate your patience as we worked to resolve the issue. We are committed to preventing similar disruptions moving forward.