As we all know, an extensive outage on Monday, October 4, brought Facebook, WhatsApp, and Instagram down. According to the estimates, this outage caused a loss of at least US$6 billion for Mark Zuckerberg as an effect of the stock price drop. And the impact on thousands of small and medium businesses relying heavily on Facebook platforms was immense.
So, looking at this huge impact, a lot of us ask the same question: why were Facebook, WhatsApp, and Instagram down, and how could it have been prevented? Based on the Facebook announcement, the outage was caused by configuration changes on the backbone routers that coordinate network traffic between their data centers. When I read the statement, the famous IT proverb immediately popped into my head:
There are two kinds of people: those who back up their data and those have never lost all their data.
Wrong configuration change… well, that doesn’t sound like something unusual to any business, right? They happen all the time! In fact, 70% of data center failures are caused by human error and could be prevented with better management/processes or configuration (source: https://www.networkworld.com/article/3444762/the-biggest-risk-to-uptime-your-staff) .
Do you wonder whether something as elementary as automated configuration backup of their routers could have helped Facebook recover much faster? Solutions that offer these capabilities can cost you only a few dollars. For example, SolarWinds® Kiwi® CatTools can back up configuration of all your network devices (even every day if you want) and allows you to roll out the last good configuration for a one-time cost of US$852… I’d call that a fair price to prevent taking down “the internet,” right? 😊
You don’t have to be a Facebook-sized company to advocate having an automated backup solution in place. If we look at some data (e.g., https://www.statista.com/statistics/753938/worldwide-enterprise-server-hourly-downtime-cost/), the average cost of downtime could be somewhere between US$100,000 – US$500,000 per hour. But whether it’s “just” a few thousand dollars or billions, there should be absolutely no discussion about spending some money on reliable backup solution for your configurations.
So, what do you think—could a simple backup solution help Facebook not be down and cut the whole world off?
Disclaimer: As we don’t really know more details about the issue, this is more of a general contemplation about the importance of backups rather than an effort to advise Facebook on how to run their business.