The very public outage of the Census website has highlighted the perils of shifting processes online. Whilst a successful transition can deliver significant benefits the risks of it not working can be even more severe. This is magnified when it is a major public exercise. Back in 2014 in the USA, the launch of the healthcare.gov website suffered a similar fate, when they were unable to deal with the influx of 10 million Americans looking for subsidised healthcare plans.
Businesses are not immune from these issues and we saw in Australia in 2012 when the Click Frenzy promotion failed to deal with the surge in traffic and crashed that it can easily become a PR nightmare. For smaller businesses it might be your accounting systems going offline, your website being hacked or your email being compromised. Regardless of the situation the damage can be just as significant.
So here are 3 lessons we believe you learn from big incidents like the Census outage;
1. Split up your plans for major changes
With such a big shift to doing the Census online it seems rash not to break it up into smaller parts. Rather than marketing to the whole country for a one night push they could have easily broken it up by states and territories. This would have spread the traffic and allowed them to do smaller locations first and iron out issues.
With your own projects look for how you can segment or split changes into stages that avoid a massive, once off shift. This avoids taking on too much at once and giving your team time to absorb the new situation.
2. Virtual testing is no match for real world users
The census system was supposedly tested at a cost of nearly $400k to deal with 150% of the expected traffic for 8 hours. They even went so far as to say it wasn’t possible that it would go offline. Yet within hours of launch it was offline and completely unavailable until the next morning.
No matter what the system the need for users to test and trial just delivers outcomes that are still very hard to replicate virtually. So allowing for groups of users or teams to test a new project or system will give you early warning signs of what could happen. It may delay the launch but it is better than experiencing the opposite where everyone is affected.
3. Outsourcing doesn’t make you immune from issues
While IBM was chosen to host the Census website, according to research firm Gartner, IBM lags market leaders Amazon AWS and Microsoft Azure in their Magic Quadrant for Cloud Providers. The natural thinking is that the ABS should have chosen one of those two instead. However even the market leader AWS had their own outage in Sydney in mid 2016. So whilst the Census situation may have nothing to do with IBM, outsourcing isn’t a pill that removes all risk. Telling the public or your staff, you won’t experience a problem with a new IT system is almost inviting Murphy’s law to kick in.
Understanding the capabilities of your outsourcers and what their backup plans are in the event of an outage is a priority. However each organisation should have it’s own risk management guidelines that details their risk framework. In the case of the Government they have their own Commonwealth Risk Management policy you can view here for reference.
Next Steps
It’s disappointing when IT projects go wrong in such a public way as people tend to revert back to avoiding change. This affects companies as much as governments and whilst they might avoid the pain in the short term the risk to not changing can actually be the more significant threat. With digital disruption and global competition we cannot sit still as a country or as businesses and stick our heads in the sand.
We do need to continue to adapt and seek alternative ways of doing business. The history books are littered with the failed experiments of entrepreneurs, inventors and scientists but that paves the way for future advances. Hopefully the ABS will learn from this outage and come back with a more robust system for the next Census.