Outage on WhatsApp inbound and outbound messages
Incident Report for Gorgias
Postmortem

Incident Summary: WhatsApp Integration Outage (August 30 - September 4, 2024)

Incident Timeline:

  • Start: August 30, 2024, 16:56 UTC
  • Resolution: September 4, 2024, 10:42 UTC

What Happened:

On August 30, 2024, at 16:56 UTC, our systems stopped receiving notifications for new WhatsApp messages, meaning inbound WhatsApp messages for integrated phone numbers would no longer flow into Gorgias. At the same time, our customers began encountering errors when trying to reply in WhatsApp tickets, receiving a message that read: "(#200) You do not have permission to access this field".

The first customer reports started arriving around August 30, 17:30 UTC, and our engineering team began investigating the issue after an alert was triggered at 19:14 UTC because of the missing notifications. Initial checks confirmed that the issue was affecting all customers, leading to a complete outage of our WhatsApp integration. We immediately reached out to Meta’s support team to understand what was causing the problem while continuing our internal investigation.

Investigation and Challenges:

The error message suggesting that we lacked permission to create messages indicated that something might have gone wrong with our app's permissions in Meta’s systems. We checked all available resources, including Meta’s changelogs, status pages, and developer community posts, but found no changes or incidents that could explain the issue. We even posted on Meta’s developer forums to see if other developers were experiencing similar issues.

Our attempts to investigate the problem via Meta’s developer and business platforms yielded no results—there were no warnings, notifications, or indicators suggesting something was wrong with our app or business account.

At 22:08 UTC on August 30, after exhausting initial troubleshooting options, we paused the investigation for the day, waiting for a response from Meta’s support team while continuing to monitor the situation. Over the weekend, we maintained communication with Meta and provided any information requested in an attempt to escalate the issue, but the responses we received did not help us resolve the problem.

Note: during this time, we continued to provide regular updates on our status page. However, due to an issue with our internal tooling, the incident was accidentally duplicated. When we deleted the duplicate on September 2, all messages posted during that period were removed as well. This is why there is a “hole” in the history of updates during that time.

Discovering the Cause:

By September 2, having received no relevant updates from Meta, we revisited our investigation and discovered a potential cause: our app was missing the advanced whatsapp_business_messaging permission, which Meta’s documentation states is required for Cloud Solution Partners. This was the first indication that this permission might be related to the issue, although the consequences of missing it were not clearly explained.

We had never had this advanced permission in the past, only standard access, and our app had worked without issue up until August 30. Despite the uncertainty about whether this was the cause, we submitted a request for the advanced permission on September 2 at 11:04 UTC.

Unfortunately, our request was rejected a few hours later. Since there was no confirmation that this permission was the root cause of the issue, we temporarily set aside the request while continuing our dialogue with Meta.

Escalation and Resolution:

On September 3, at 19:13 UTC, we submitted another request for the advanced whatsapp_business_messaging permission, but this was rejected as well. By September 4, at 08:37 UTC, we submitted yet another request with the additional information Meta required. This time, the request was approved at 10:42 UTC on September 4, and immediately after, inbound notifications resumed, and the errors our customers were experiencing stopped.

Ongoing Efforts:

Following the resolution, we continued to communicate with Meta to understand why the absence of this advanced permission, suddenly caused a complete outage. We also asked Meta if it would be possible to recover the messages that were missed during the outage by replaying the related notifications, or if there were any other options for the recovery. Meta initially responded that this was not possible.

We followed up with another request, asking for further details as to why Meta’s systems seem to lack the ability to replay these notifications, but we have not yet received a response.

As of September 12, 2024, we are waiting for further clarification from Meta.

Impact on Customers:

  • From August 30, 16:56 UTC to September 4, 10:42 UTC, customers were unable to receive or send WhatsApp messages through Gorgias.
  • This impacted all customers using our WhatsApp integration, and new messages were not processed by our system during this period.
  • As of our current knowledge, it is not possible to recover the messages that were missed during the time of the incident.

Actions Taken:

  • We immediately investigated the issue and reached out to Meta’s support team.
  • We attempted multiple escalations with Meta to get the issue resolved but were ultimately left to discover the root cause independently.
  • After identifying that the whatsapp_business_messaging permission might be required, we requested and gained approval for the permission, which restored full functionality.

Next Steps: We are continuing to engage with Meta to understand why this permission became necessary unexpectedly and why there were no clear warnings in Meta’s systems about its importance. We are also investigating ways to prevent similar incidents in the future by improving our monitoring systems and ensuring quicker escalation paths. Additionally, we will continue seeking ways to recover the missed WhatsApp messages.

We apologize for the inconvenience this incident has caused and appreciate your patience as we worked to resolve the issue. We are committed to preventing similar disruptions moving forward.

Posted Sep 12, 2024 - 04:38 PDT

Resolved
We have not yet received any response from Meta's support team.

We're marking the incident as resolved as the WhatsApp integration is now fully operational, however we will continue to communicate with Meta to understand what caused the incident and whether any messages that may have been lost can be recovered.

We will post an update with a public post-mortem of the incident by the end of next week.
Posted Sep 04, 2024 - 10:37 PDT
Monitoring
We haven't got a reply from Meta yet, however, we were able to resolve the permission issue we had identified earlier.

That appears to have mitigated the messaging problem: new messages are now coming into Gorgias and we've stopped experiencing errors when sending replies.

We will now contact Meta's support to see if we can recover any messages that may have been lost while we were affected by this.
Posted Sep 04, 2024 - 04:14 PDT
Update
We're continuing to monitor our support ticket and repeatedly requesting that our case be treated urgently, unfortunately we have not yet had a proper investigation and response from Meta.
Posted Sep 04, 2024 - 03:09 PDT
Update
We are still experiencing this issue and waiting for an update from Meta's support team.
Posted Sep 03, 2024 - 12:39 PDT
Update
We have provided Meta's support with all the required information and the results of our investigation, but we haven't yet received any reply to our question about what exactly started causing this problem and what's required for its resolution.

We're now looking for a way to get our ticket escalated and properly investigated by Meta as soon as possible.
Posted Sep 03, 2024 - 04:10 PDT
Update
The problem persists, we've identified the potential cause (our WhatsApp app appears to be missing messaging permissions) but we're still waiting for Meta's support to confirm and eventually help with its resolution.
Posted Sep 02, 2024 - 04:35 PDT
Update
The problem persists, we've provided Meta's support with the required information and are now awaiting further instructions.
Posted Aug 31, 2024 - 05:18 PDT
Update
We continue to not receive notifications and encounter errors when creating new messages.

We haven't heard back from Meta's support yet.
Posted Aug 30, 2024 - 23:03 PDT
Update
We haven't been able to identify the problem yet, everything appears to be in order on our systems and our app's configuration.

We've reached out to Meta's support and will post an update as soon as there's any news.
Posted Aug 30, 2024 - 15:08 PDT
Investigating
We stopped receiving notifications for WhatsApp messages starting at 16:56 UTC, as a result new messages from that time may not have been ingested into Gorgias.

We're also observing failures on new messages sent via the helpdesk with the "There was a problem with the access token or permissions you are using for the API call." error.
Posted Aug 30, 2024 - 13:05 PDT
This incident affected: Helpdesk Integrations (WhatsApp).