Expert: Here’s Why Has Been a Disaster

The man who wrote the book (literally) on IT project failure says the beleaguered website needs a strong executive, not a surge.

Photograph by Jeff Fusco

Photograph by Jeff Fusco

On Tuesday, the Obama administration announced that it had tapped Jeffrey Zients – a Bain & Co. alum and incoming head of the National Economic Council – to lead the efforts to fix the glitches that have plagued the Affordable Care Act website since its October 1 launch.

Zients–who is credited with helping turn around the malfunctioning website of another government program, cash-for-clunkers–will work with presidential innovation fellows and representatives of federal contractor CGI Group as part of a “tech surge” (Washington sure does love its surges, doesn’t it?) with the goal of having the portal running smoothly before an uptick in enrollment expected around the Thanksgiving holiday. By all accounts Zients will have his work cut out for him; but experts in the know say success is well within his reach.

According to Jim Johnson, founder and chairman of The Standish Group–a Boston-based IT consultancy–taking live all at  after less than two weeks of testing was a “fool’s errand,” but it’s not too late to get it back on track. Johnson’s company has been collecting case information on real-life IT environments and software development projects since 1985, and its cumulative research encompasses 12 years of data on why projects succeed or fail.

In other words, Johnson knows a thing or two about why large IT deployments struggle to get off the ground. And he’s not even a little surprised that is still stuck in neutral.

“What would be unusual would be if it did work,” said Johnson – who, in 2011, was invited by the National Academy of Sciences to give a presentation on IT best practices for the Centers for Medicare and Medicaid Services and says he warned officials of the technical challenges surrounding the Obamacare roll out. “It would shock me if it actually did work.”

Earlier this week, Johnson crunched the numbers on more than 3,500 IT projects with budgets exceeding $10 million deployed between 2003 and 2012, and found that only 6.4% could be classified as successful. More than half were “challenged,” meaning they were over budget, behind schedule or didn’t meet user expectations, while the remaining 41.4% were failures and had to be scrapped or started over from scratch.  Johnson classifies the ACA website rollout as a classic “challenged program” that can still be successful with the right tweaking.

“It’s way over budget, and it seems to have poor user acceptance, but that’s not the same thing as rejection,” he said. “Right now users are just unsatisfied. If users were rejecting it, it would be a failure.”

I spoke with Johnson on the phone yesterday to find out where the administration dropped the ball on, and what they’ll need to do to regain possession.

You’ve been quoted as saying “didn’t have a chance in hell” of success right out of the gate. Why were the odds stacked against it?

For one thing it’s a government project, and government projects are often victims of bureaucracy where there is no single person who has total control and responsibility. Also it was outsourced through a historically unreliable and cumbersome contracting process. On top of that it’s a simple problem with a very complex solution, which is a bad mix. The complexity comes in the form of the number of stakeholders that are involved. The healthcare system in and of itself is a stakeholder, then you have 39 states on top of that, each with up to five individual exchanges, plus Social Security, the IRS. With all that added complexity, it’s hard to make decisions quickly and decisively.

How did the administration’s deployment schedule exacerbate these challenges?

The biggest mistake was probably going live with the entire platform all at once instead of taking an incremental approach. A “Big Bang” deployment is the most dangerous way to go—you’re going to come out with a product that is fraught with issues. The average defect rate on a set of code is somewhere between three defects out of a thousand lines of code, so if you have a platform like this, with something like 500,000 lines of code, you’re looking at 15,000 defects, maybe 7,500 if you’re really good. The more you can do unit testing upfront, the cheaper it is to remove these defects, but there was little to no unit testing. What they should have done is focus on optimizing the high-value components of the system, because 80 percent of the components in a system like this never get used anyway.

What are the most important components of successfully managing a project of this scope?

There are two, really. One is having an executive sponsor that can make decisions, someone with a vested interest in the project. There are thousands of decisions that get made in a project of this size. So you need a strong executive voice, someone that is responsible and accountable and can remove some of the red tape. In this case there was simply too much governance. The first reaction [to problems with] is that it needed more governance, but that’s like putting gasoline on an already raging fire. What you really want to do is reduce the governance and make someone responsible. You have to give people freedom to make mistakes.

The other one is related and it’s what we call “emotional maturity.”  This is about being accountable and taking risks. It’s about making decisions quickly and decisively in an organization that is mature enough that it can be confident about those decisions. If you have both these things, you have a good chance of success.

This week the administration announced a plan to fix the glitches in the ACA exchange website. Based on your experience with projects that have succeeded after initial challenges, what’s the best course of action?

Well, they can’t take the site down at this point, so they need to start by prioritizing the problems. For starters they need a strong executive sponsor who has total project control. One thing that really bothers me about [the administration plan] is the fact that they are going into a “surge” mode, but throwing lots of people at the problem isn’t necessarily going to fix anything, it really depends on what those people do. If they are doing testing and code reviews and code inspection then I think that’s a good thing. If they are actually changing the code or working on the code that’s a very, very bad thing because it’s like putting 10 cooks on a stove. What you really want to do is get the brightest and the best in a small group of 20 people at most, put them in a room and let them work on issues that an executive sponsor has prioritized and with documentation of defects based on criticality. You need to take a scalpel to this thing and cut out all the junk.  It’s about taking stuff out rather than putting stuff in.