Wednesday, October 22, 2008

What to do if a bug has leaked into production?

The application which your team had tested was released into production. A day after the release, you get a mid-night call, that a priority 1 issue has been reported in the production environment. What do you do in such a scenario?

First of all do not panic. You need to remain calm and composed. There could be different reasons due to which an issue can occur:
  • Application is not deployed or configured correctly
  • Incorrect data in production database is causing it
  • Due to incorrect understanding by end users, the feature is reported as a bug
  • If none of the above, then it could be a bug in the application.

    Broadly, your response to such a situation can be divided in two broad categories. One is the immediate corrective action which needs to be taken to minimize the impact of this issue, and second is the root cause analysis for identifying that why it happened and how we can ensure that similar issue does not happen again.

    a) Immediate corrective action:
    Try to collect all the required information regarding the issue found in production.

    - In which particular scenario/condition is the problem happening?
    - Is the problem happening consistently in production or is it intermittent?
    - What percentage of end users is getting impacted by this issue?
    - Is this problem corrupting or losing data which is not recoverable? If yes, what can be done to minimize the data loss?
    - Can this issue be replicated in QA environment?

    b) Root cause analysis and fixing the process:
    One the issue is identified, we need take the following steps so that similar issue does not happen again:
    - The QA team tries to simulate the problematic scenario in QA environment. It works closely with Business team, development team and production support team to recreate the scenario.
    - If the resolution of the issue requires a patch release, then start planning to test the patch release.
    - To identify the root cause of the issue, perform causal analysis.
    - Identify corrective actions required so that similar issue does not happen again.
    - Implement corrective action
    - Be proactive and keep all stakeholders informed at each stage.

The above steps will help in making sure that you do not loose the confidence of the stakeholders.

No comments: