The Cost of Slack Downtime…and What to Do About It
Many of us woke up this morning to get our coffee and read the latest office news in Slack. Some of us have co-workers all over the world, night owl bosses, and workflows that lead to morning message backlog.
This morning, though, all of our backlogs were a little harder to sift through thanks to a Slack outage in Europe and the US. To calm down, some of us might have turned to our Google Home or Chromecast to unwind while the outage hours piled up, only to find those were down too!
What a morning!
Now that Slack is running again, let’s take a moment to reflect on what the outage means and what we can learn from it.
The Slack Outage
Users began having trouble logging into Slack at around noon in the UK, and reported those issues via Twitter. The messaging service’s account acknowledged the outages, made some changes on their end and things seemed ok. Then, a few hours later, the system was down for everyone.
All workspaces are currently experiencing difficulty connecting to Slack. We appreciate your patience while we look into this. https://t.co/jdzxIeLoWA
— Slack Status (@SlackStatus) June 27, 2018
It’s unclear whether this was a chain reaction, or whether the outages were coincidence, but most people were unable to log in for roughly four hours. The problem was eventually isolated, and connectivity was restored by 10 AM Eastern time. Most customers appeared to be pretty lax about the outage. Engadget remarked that the downtime meant some of us could finally get to work.
Maybe that’s true.
Slack’s social media team did an excellent job of keeping the public informed of the outage, and the team’s efforts to fix it. The company also maintains an internal status page, where users were able to observe those efforts to bring various servers back online.
We can learn a lot from dissecting these outages and reviewing what they mean for our organizations.
Downtime Carries Costs
Tools we rely on go down all the time, often at hours that we don’t notice and for very brief periods. We might not feel affected until an extended outage, but even small disruptions can cost certain groups.
Uptime.com allows teams to utilize Slack as a method to inform key personnel of an outage. In this case, those with Slack functionality enabled may not have received downtime notifications for their websites.
Even backup messaging and escalation techniques are essential when you’re monitoring downtime. If you’re funneling that information through an external tool, and that tool goes down, your alternatives to learning of an outage are limited.
Slack, Gmail, and project management infrastructure are critical to the life of many organizations. This outage showed many of us just how much we take tools like Slack for granted. But outages have costs in terms of work hours and productivity. One way to mitigate the loss of downtime is with a contingency plan to shift infrastructure in the event of outages such as these.
Organizations Require a Backup
Workers won’t discover a Slack outage (or other tools) until they need them. In this case, the downtime halted productivity for some.
In this situation, Uptime.com’s API monitoring system can “talk” with Slack and get that status long before anyone even starts their morning French Press. With escalation, decision makers can quickly divert the team’s communication to a designated backup tool, perhaps hosted internally, or direct employees to complete assigned tasks as they wait for connectivity to be restored. Notes and updates keep them informed, so they can forget about the outage and focus on their work.
Meanwhile, your trusted IT ninjas receive escalations if outages persist. They don’t wait for external updates; they apply a series of common fixes to try and correct the problem on your end. Realizing that your infrastructure is robust is an essential step in troubleshooting. Never underestimate the value of experienced technicians reviewing every possible fix as quickly as possible. If IT can correct the problem, Uptime.com will offer first-response teams invaluable data to do so.
Keeping Customers Happy
Downtime is forgivable, but failing to inform your client base and being secretive about what happened can break the trust between you and your customers. Failing to disclose or acknowledge an outage doesn’t help maintain trust. It is your responsibility to maintain connectivity for your userbase. Increasingly, those users expect to be informed of downtime and given an estimation of when the service may return.
Service estimations can be challenging, so it’s good to have some data about the problem. You can use public status pages to try and inform users and free up support to deal with other issues. Keeping users notified of downtime is especially essential for SaaS companies, where one service outage may affect another.
Check out our other outage reports:
Minute-by-minute Uptime checks.
Start your 21-day free trial with no credit card required at Uptime.com.