Practice: Root-cause Analysis

Discussion:

Kent Beck

2005-03-29 23:39:42 UTC

Every time a defect is found after development, eliminate the defect and its
cause. The goal is not just that this one defect won't ever recur, but that
the team will never make the same kind of mistake again.

In XP, this is the process for a regression test:
1. Write an automated system-level test that demonstrates the defect,
including the desired behavior. This can be done by the customer, by
customer support, or by developers.
2. Write a unit test with the smallest possible scope that also reproduces
the defect.
3. Fix the system so the unit test works. This should cause the system
test to pass also. If not, return to 2.

Once the defect is resolved, figure out why the defect was created and
wasn't caught. Initiate the necessary changes to prevent this kind of defect
in the future.
Taiichi Ohno has a simple exercise for this last step, the 5 Whys. Ask five
times why a problem occurred. So, for example,
1. Why did we miss this defect? Because we didn't know the balance could
be negative overnight.
2. Why didn't we know? Because only Mrs. Crosby knows and she isn't part
of the team.
3. Why isn't she part of the team? Because she is still supporting the old
system and no one else knows how.
4. Why doesn't anyone else know how? Because it isn't a management
priority to teach anyone.
5. Why isn't it a management priority? Because they didn't know that a
$20,000 investment could have saved us $500,000."

After 5 Whys, you find the people problem lying at the heart of the defect
(and it's almost always a people problem). Addressing that problem and the
other problems encountered along the way will give you some reassurance that
you won't ever have to deal with this particular mistake again.
I've put formal regression testing, as opposed to just writing another
test, in the corollary practices because most teams have too many defects to
be able to invest heavily in resolving each of them. Once the defect rate is
down to one a week or one a month, though, the investment is proportional
and the team has practice improving in other ways. It is ready for a deeper
look at its own weaknesses.

Brad Appleton

2005-03-30 05:14:44 UTC

Permalink

Hi Kent!
I like the "5 Whys" approach to root-cause analysis (RCA). Im unsure if
your post is recommending this particular way of doing RCA as the
"recommended practice" or if youre recommending doing RCA and the "5
Whys" just happen to be one way of doing it, and youre not preaching a
particular way of doing it so much as you are preaching coming up with A
way of doing it.

The reason I mention this is because Ive seen some very ligfhtweight and
effective approaches to RCA, and some very heavyweight ineffective ones.

In fact, enough folks spent enough time doing it that they came to the
conclusion that it is extremely hard to do well, and took a lot of time
to do it. So much so they came up with what they thought was a more
effective means of achieving a similar goal (some would say the same
goal, others would argue that).

They came up with something called "Orthogonal Defect Classification" or
"ODC" (see
<http://www.google.com/search?q=%22Orthogonal+Defect+Classification%22>)

Doing ODC sure does seem enormously more complex than something as
simple as "the 5 whys". And yet, many claim that it ends up being much
quicker and more accurate to do (certaily would be when compared against
some of the ultra heavyweight implementations of RCA that Ive seen in
some organizations)

Post by Kent Beck
Every time a defect is found after development, eliminate the defect and its
cause. The goal is not just that this one defect won't ever recur, but that
the team will never make the same kind of mistake again.
1. Write an automated system-level test that demonstrates the defect,
including the desired behavior. This can be done by the customer, by
customer support, or by developers.
2. Write a unit test with the smallest possible scope that also reproduces
the defect.
3. Fix the system so the unit test works. This should cause the system
test to pass also. If not, return to 2.
Once the defect is resolved, figure out why the defect was created and
wasn't caught. Initiate the necessary changes to prevent this kind of defect
in the future.
Taiichi Ohno has a simple exercise for this last step, the 5 Whys. Ask five
times why a problem occurred. So, for example,
1. Why did we miss this defect? Because we didn't know the balance could
be negative overnight.
2. Why didn't we know? Because only Mrs. Crosby knows and she isn't part
of the team.
3. Why isn't she part of the team? Because she is still supporting the old
system and no one else knows how.
4. Why doesn't anyone else know how? Because it isn't a management
priority to teach anyone.
5. Why isn't it a management priority? Because they didn't know that a
$20,000 investment could have saved us $500,000."
After 5 Whys, you find the people problem lying at the heart of the defect
(and it's almost always a people problem). Addressing that problem and the
other problems encountered along the way will give you some reassurance that
you won't ever have to deal with this particular mistake again.
I've put formal regression testing, as opposed to just writing another
test, in the corollary practices because most teams have too many defects to
be able to invest heavily in resolving each of them. Once the defect rate is
down to one a week or one a month, though, the investment is proportional
and the team has practice improving in other ways. It is ready for a deeper
look at its own weaknesses.
*Yahoo! Groups Sponsor*
ADVERTISEMENT
click here
<http://us.ard.yahoo.com/SIG=129n754gn/M=298184.6018725.7038619.3001176/D=groups/S=1705007207:HM/EXP=1112225993/A=2593423/R=0/SIG=11el9gslf/*http://www.netflix.com/Default?mqso=60190075>
------------------------------------------------------------------------
*Yahoo! Groups Links*
http://groups.yahoo.com/group/xpbookdiscussiongroup/
* Your use of Yahoo! Groups is subject to the Yahoo! Terms of
Service <http://docs.yahoo.com/info/terms/>.

--
Brad Appleton <brad-***@public.gmane.org> www.bradapp.net
Software CM Patterns (www.scmpatterns.com)
Effective Teamwork, Practical Integration
"And miles to go before I sleep" --Robert Frost

konopelko_pavel

2005-03-31 08:11:28 UTC

Permalink

Brad,

The goals of ODC seem to be more about providing a classification
framework to gather and analyze statistical data about defects. ODC
pages at www.research.imb.com say:

"ODC is a scheme to capture the semantics of each software defect
quickly. It is the definition and capture of defect attributes that
make mathematical analysis and modeling possible. Analysis of ODC
data provides a valuable diagnostics method for evaluating the
various phases of the software life cycle (design, development, test
and service) and the maturity of the product."

The goals of "5 Whys" seem to be more about creating an opportunity
to freely explore how a particular defect is connected to a broader
context.

So in the end it depends on the needs of a particular project to
decide on the approach it takes. IMO, larger projects in mature
environments could benefit more from ODC. OTOH, smaller projects in
less mature environments could benefit more from "5 Whys".

Regards,
--Pavel Konopelko

Post by Brad Appleton
Hi Kent!
I like the "5 Whys" approach to root-cause analysis (RCA). Im

unsure if

Post by Brad Appleton
your post is recommending this particular way of doing RCA as the
"recommended practice" or if youre recommending doing RCA and

the "5

Post by Brad Appleton
Whys" just happen to be one way of doing it, and youre not

preaching a

Post by Brad Appleton
particular way of doing it so much as you are preaching coming up with A
way of doing it.
The reason I mention this is because Ive seen some very

ligfhtweight and

Post by Brad Appleton
effective approaches to RCA, and some very heavyweight ineffective ones.
In fact, enough folks spent enough time doing it that they came to the
conclusion that it is extremely hard to do well, and took a lot of time
to do it. So much so they came up with what they thought was a

Post by Brad Appleton
effective means of achieving a similar goal (some would say the same
goal, others would argue that).
They came up with something called "Orthogonal Defect

Classification" or

Post by Brad Appleton
"ODC" (see
<http://www.google.com/search?q=%

22Orthogonal+Defect+Classification%22>)

Post by Brad Appleton
Doing ODC sure does seem enormously more complex than something as
simple as "the 5 whys". And yet, many claim that it ends up being much
quicker and more accurate to do (certaily would be when compared against
some of the ultra heavyweight implementations of RCA that Ive seen in
some organizations)

Post by Kent Beck
Every time a defect is found after development, eliminate the defect and its
cause. The goal is not just that this one defect won't ever

recur, but that

Post by Brad Appleton

Post by Kent Beck
the team will never make the same kind of mistake again.
1. Write an automated system-level test that demonstrates the defect,
including the desired behavior. This can be done by the

customer, by

Post by Brad Appleton

Post by Kent Beck
customer support, or by developers.
2. Write a unit test with the smallest possible scope that

also reproduces

Post by Brad Appleton

Post by Kent Beck
the defect.
3. Fix the system so the unit test works. This should cause the system
test to pass also. If not, return to 2.
Once the defect is resolved, figure out why the defect was

created and

Post by Brad Appleton

Post by Kent Beck
wasn't caught. Initiate the necessary changes to prevent this kind of defect
in the future.
Taiichi Ohno has a simple exercise for this last step, the 5

Whys. Ask five

Post by Brad Appleton

Post by Kent Beck
times why a problem occurred. So, for example,
1. Why did we miss this defect? Because we didn't know the

balance could

Post by Brad Appleton

Post by Kent Beck
be negative overnight.
2. Why didn't we know? Because only Mrs. Crosby knows and she isn't part
of the team.
3. Why isn't she part of the team? Because she is still

supporting the old

Post by Brad Appleton

Post by Kent Beck
system and no one else knows how.
4. Why doesn't anyone else know how? Because it isn't a

management

Post by Brad Appleton

Post by Kent Beck
priority to teach anyone.
5. Why isn't it a management priority? Because they didn't

know that a

Post by Brad Appleton

Post by Kent Beck
$20,000 investment could have saved us $500,000."
After 5 Whys, you find the people problem lying at the heart of the defect
(and it's almost always a people problem). Addressing that

problem and the

Post by Brad Appleton

Post by Kent Beck
other problems encountered along the way will give you some

reassurance that

Post by Brad Appleton

Post by Kent Beck
you won't ever have to deal with this particular mistake again.
I've put formal regression testing, as opposed to just writing another
test, in the corollary practices because most teams have too

many defects to

Post by Brad Appleton

Post by Kent Beck
be able to invest heavily in resolving each of them. Once the defect rate is
down to one a week or one a month, though, the investment is

proportional

Post by Brad Appleton

Post by Kent Beck
and the team has practice improving in other ways. It is ready for a deeper
look at its own weaknesses.

Continue reading on narkive:

Search results for 'Practice: Root-cause Analysis' (Questions and Answers)

replies

Can you join the fight against corruption ? Do you agree Anna Hazare is right. Time has come to do something!?

started 2011-05-07 22:43:41 UTC

civic participation

replies

While we are pursuing external vision by all known means, why we are neglecting exploring internal vision?

started 2013-02-20 18:52:07 UTC