Recent Posts

November 2008
M T W T F S S
« Oct   Dec »
 12
3456789
10111213141516
17181920212223
24252627282930

Mobile Barcode Tool

This is a 2D-barcode containing the address of our mobile site.If your mobile has a barcode reader, simply snap this bar code with the camera and launch the site.

Root cause analysis – do you really want to know?

Root Cause Analysis – it’s not about finger pointing (or at least, it shouldn’t be.)

Sometimes it’s just easier if you can blame some thing (or some one entity.)  We all are more comfortable when a handy scapegoat is available – if, however, you really want to solve problems then you have to dig; sometimes (if you are lucky) the digging will be brief but usually, it will be a relatively deep process.

When you have a significant failure (i.e. one that you don’t want to experience again) how can you:

  • find the root cause? (RC)
  • make changes to mitigate or remove the problem?

Whenever you have processes that can’t fail how do you realize 100% availability, performance?

Some possible steps in root cause analysis (RCA):

  • identify the variables (hardware, software, networking, people, etc.)
  • identify the process relationships (automated? real-time?, etc.)
  • which (if any) of the above are outside of your control (a vendor-side problem?)

If a vendor is identified then hand-off and require resolution.   If the ball is in your camp, then armed with the above, proceed by:

  • reviewing the components for any recents changes (any hardware, network, OS, application updates/changes?)
  • locating/reviewing low-hanging fruit (sometimes the RC is really simple – i.e. the power loss was the result of the CIO testing the emergency power button in the data center. :) – Now we all know that the red button really works – and no additional ‘tests’ are planned for this quarter.)
  • isolating the problem areas/devices/processes and hand off to appropriate groups for further research
  • attempting to reproduce the problem (this is actually good news if you succeed since it reduces the variables!)
  • reviewing at a detail level – the hardware/OS/software configurations, processes, code (eliminate the network, hardware, OS and OS services; now we are down to the application)

What about visits from Murphy? (i.e. we can’t find an RC…)

  • sometimes, stuff just happens – do what you can to avoid it, but
  • always be ready to adapt (for any given mission/process do you have a Plan n^x?  or at least, Plan n+1?)
Share and Enjoy:
  • LinkedIn
  • Digg
  • del.icio.us
  • Google Bookmarks
  • Blogosphere News
  • Technorati
  • TwitThis
  • Live
  • Slashdot
  • Sphinn
  • Mixx
  • Yahoo! Buzz
  • StumbleUpon
  • Facebook
  • MSN Reporter
  • Reddit
  • RSS
  • Yahoo! Bookmarks

No related posts.

Leave a Reply - Please use your Real Name...

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>