Scale + Impact + Risk: How Obama fumbled Healthcare.gov
I feel sorry for Obama.
It seems one of his greatest achievements is at risk because of something as boring as website testing. Talk about a butterfly effect!
But to be honest, this is not so surprising.
Way back in 2006 I wrote ... 'some organisations view [the testing] phase of development as an unwelcome delay that can prevent their projects finishing on time. Judging by the number of sites that are launched with basic errors, second-rate testing appears to be the norm.' (Extract from The Website Manager's Handbook.)
At its core Website Testing is a process for evaluating the conformance of a site to an agreed set of guidelines. The purpose is to ensure your site is capable of operating to a minimum acceptable standard, in order to meet the goals that have been set for it.
Sounds straightforward.
But when you consider just how large & complex most modern websites are, and the range of testing that needs to be undertaken, you might want to think again.
Download the 'Website Testing' chapter of 'The Website Manager's Handbook' for more on testing methods & protocols.
» Download free guide to Website Testing (PDF 0.5Mb)
Part of the issue is deciding how to allocate you limited resource to ensure important issues can be uncovered.
What I mean is that it is usually impossible to test everything on site. No one has that much time or money.
Rather you need to get clever by selecting a sample that is representative enough to give you confidence that critical errors will captured.
A useful way to approach this is to consider the following 3 criteria as lens for focussing effort.
This includes elements that - if adversely affected - will have a very negative impact on the organisation. For example, if the e-commerce application for an online retailer goes down, it cannot collect revenue. Similarly for Healthcare.gov, if people can't choose a plan, it can't fulfil it basic function.
This includes issues that undermine the reason for many people's visit to your site. Again, if people can't choose a plan on Healthcare.gov, it stymies the most popular interaction.
This encompasses technology that - from experience - is known to be sensitive to events. For example, if you have an application that is known to crash due to sudden surges in traffic or heavy traffic for sustained periods, you might want to look into that.
At a minimum, you should focus effort on content that scores high on all three parameters. That is, content that is:
The conundrum you then have to face is whether to launch the site as-is (complete with errors), or insist on a delay to allow remedial action?
As the president's team discovered, decisions of this type are inherently thorny.
This basically asks how many people will be affected. Interestingly, it is not always necessary to delay a launch if a lot of people are affected, simply because the issue may not be serious.
For example, if the newsletter signup function on the website went down, a lot of people may be inconvenienced but the impact is low.
This asks what the effect on people will be if the error occurs when live. This is usually the most important consideration.
Even an application that is little used can sometimes have a very significant impact, if it disrupts the experience for an important cohort of users (think journalists in Obama's case.)
Of course, if the impact is high and lot of people are affected, that is even worse.
This can also have a disproportionate effect on a go/no-go decision.
For example, an application that is little used and that causes slight disruption when down, may be high risk if it involves an issue of concern to the public, such as data security or other privacy issues.
Of course, where all three criteria have a high likelihood of occurring, it is a no-brainer to think that a launch must be delayed.
And that's where Obama's team seemed to slip up.
They thought they could handle it and they couldn't.
In my analysis, the reason is not that lots of people were affected or that it involved a high-impact application. The problem was that the resulting risks were not properly considered.
The developers failed to pay enough attention to the politics involved and media storm that would result if even a minor error was discovered.
The fact that the core application itself failed to work was then practically unforgivable.
Download the 'Website Testing' chapter of 'The Website Manager's Handbook' for more on testing methods & protocols.
» Download free guide to Website Testing (PDF 0.5Mb)
Read more about 'The Website Manager's Handbook'.
It seems one of his greatest achievements is at risk because of something as boring as website testing. Talk about a butterfly effect!
But to be honest, this is not so surprising.
Way back in 2006 I wrote ... 'some organisations view [the testing] phase of development as an unwelcome delay that can prevent their projects finishing on time. Judging by the number of sites that are launched with basic errors, second-rate testing appears to be the norm.' (Extract from The Website Manager's Handbook.)
And so bad web testing continues
At its core Website Testing is a process for evaluating the conformance of a site to an agreed set of guidelines. The purpose is to ensure your site is capable of operating to a minimum acceptable standard, in order to meet the goals that have been set for it.
Sounds straightforward.
But when you consider just how large & complex most modern websites are, and the range of testing that needs to be undertaken, you might want to think again.
Download the 'Website Testing' chapter of 'The Website Manager's Handbook' for more on testing methods & protocols.
» Download free guide to Website Testing (PDF 0.5Mb)
What to test a website for?
What I mean is that it is usually impossible to test everything on site. No one has that much time or money.
Rather you need to get clever by selecting a sample that is representative enough to give you confidence that critical errors will captured.
A useful way to approach this is to consider the following 3 criteria as lens for focussing effort.
Mission Critical
This includes elements that - if adversely affected - will have a very negative impact on the organisation. For example, if the e-commerce application for an online retailer goes down, it cannot collect revenue. Similarly for Healthcare.gov, if people can't choose a plan, it can't fulfil it basic function.
Frequently Used
This includes issues that undermine the reason for many people's visit to your site. Again, if people can't choose a plan on Healthcare.gov, it stymies the most popular interaction.
Event Sensitive
This encompasses technology that - from experience - is known to be sensitive to events. For example, if you have an application that is known to crash due to sudden surges in traffic or heavy traffic for sustained periods, you might want to look into that.
At a minimum, you should focus effort on content that scores high on all three parameters. That is, content that is:
- Very mission critical
- Very frequently used
- Very event sensitive
The conundrum you then have to face is whether to launch the site as-is (complete with errors), or insist on a delay to allow remedial action?
As the president's team discovered, decisions of this type are inherently thorny.
- Going live too early can result in bad press (!!) when key applications do not work properly.
- Yet, delaying a launch could antagonise stakeholders who want the site to be made public.
First, what will be the scale of the disruption if we go live as-is?
This basically asks how many people will be affected. Interestingly, it is not always necessary to delay a launch if a lot of people are affected, simply because the issue may not be serious.
For example, if the newsletter signup function on the website went down, a lot of people may be inconvenienced but the impact is low.
Second, what will be the impact of the disruption?
This asks what the effect on people will be if the error occurs when live. This is usually the most important consideration.
Even an application that is little used can sometimes have a very significant impact, if it disrupts the experience for an important cohort of users (think journalists in Obama's case.)
Of course, if the impact is high and lot of people are affected, that is even worse.
Third, what risks may arise as a result of a disruption?
This can also have a disproportionate effect on a go/no-go decision.
For example, an application that is little used and that causes slight disruption when down, may be high risk if it involves an issue of concern to the public, such as data security or other privacy issues.
Of course, where all three criteria have a high likelihood of occurring, it is a no-brainer to think that a launch must be delayed.
How Obama fumbled the healthcare website
And that's where Obama's team seemed to slip up.
They thought they could handle it and they couldn't.
In my analysis, the reason is not that lots of people were affected or that it involved a high-impact application. The problem was that the resulting risks were not properly considered.
The developers failed to pay enough attention to the politics involved and media storm that would result if even a minor error was discovered.
The fact that the core application itself failed to work was then practically unforgivable.
Download the 'Website Testing' chapter of 'The Website Manager's Handbook' for more on testing methods & protocols.
» Download free guide to Website Testing (PDF 0.5Mb)
« Home