Sundarrajk's Weblog

Archive for the ‘Debugging’ Category

These are excerpts from the book. These are the summary from end of each of the chapter.

Chapter 1 – A Method in Madness

  • Make sure to do the following:
    • Work out why the software is behaving unexpectedly.
    • Fix the problem.
    • Avoid breaking anything else.
    • Maintain or improve overall quality.
    • Ensure that the same problem does not occur elsewhere and cannot occur again.
  • Leverage your software’s ability to show you what’s happening.
  • Work on only one problem at a time.
  • Make sure that you know exactly what you’re looking for:
    • What is happening?
    • What should be happening?
  • • Check simple things first.

Chapter 2 – Reproduce

  • Find a reproduction before doing anything else.
  • Ensure that you’re running the same version as the bug was reported against.
  • Duplicate the environment that the bug was reported in.
  • Determine the input necessary to reproduce the bug by:
    • Inference
    • Recording appropriate inputs via logging
  • Ensure that your reproduction is both reliable and convenient through iterative refinement:
    • Reduce the number of steps, amount of data, or time required.
    • Remove nondeterminism
    • Automate.

Chapter 3 – Diagnose

  • Construct hypotheses, and test them with experiments.
    • Make sure you understand what your experiments are going to tell you.
    • Make only one change at a time.
    • Keep a record of what you’ve tried.
    • Ignore nothing.
  • When things aren’t going well:
    • If the changes you’re making don’t seem to be having an effect, you’re not changing what you think you are.
    • Validate your assumptions.
    • Are you facing multiple interacting causes or a changing underlying system?
  • Validate your diagnosis.

Chapter 4 – Fix

  • Bug fixing involves three goals:
    • Fix the problem.
    • Avoid introducing regressions.
    • Maintain or improve overall quality (readability, architecture, test coverage, and so on) of the code.
  • Start from a clean source tree.
  • Ensure that the tests pass before making any changes.
  • Work out how you’re going to test your fix before making changes.
  • Fix the cause, not the symptoms.
  • Refactor, but never at the same time as modifying functionality.
  • One logical change, one check-in.

Chapter 5 – Reflect

“The six stages of debugging” and reads as follows:

  1. That can’t happen.
  2. That doesn’t happen on my machine.
  3. That shouldn’t happen.
  4. Why is that happening?
  5. Oh, I see.
  6. How did that ever work?

  • Take the time to perform a root cause analysis:
    • At what point in your process did the error arise?
    • What went wrong?
  • Ensure that the same problem can’t happen again:
    • Automatically check for problems.
    • Refactor code to remove the opportunity for incorrect usage.
    • Talk to your colleagues, and modify your process if appropriate.
  • Close the loop with other stakeholders.

Chapter 6 – Discovering that you have a problem

  • Make the most of your bug-tracking system:
    • Pick one at an appropriate level of complexity for your particular situation.
    • Make it directly available to your users.
    • Automate environment and configuration reporting to ensure accurate reports.
  • Aim for bug reports that are the following:
    • Specific
    • Unambiguous
    • Detailed
    • Minimal
    • Unique
  • When working with users, do the following:
    • Streamline the bug-reporting process as much as possible.
    • Communication is key—be patient and imagine yourself in the user’s shoes.
  • Foster a good relationship with customer support and QA so you can leverage their support during bug fixing.

Chapter 7 – Pragmatic Zero Tolerance

  • Detect bugs as early as possible, and fix them as soon as they come to light.
  • Act as though bug-free software was an attainable goal, but temper perfectionism with pragmatism.
  • If you find yourself faced with a poor quality codebase, do the following:
    • Recognize there is no silver bullet.
    • Make sure that the basics are in place first.
    • Separate clean code from unclean, and keep it clean.
    • Use bug triage to keep on top of your bug database.
    • Incrementally clean up bad code by adding tests and refactoring.

Chapter 8 – Special Cases

  • When patching an existing release, concentrate on reducing risk.
  • Keep on the lookout for compatibility implications when fixing bugs.
  • Ensure that you have completely closed any timing windows, not just decreased their size.
  • When faced with a heisenbug, minimize the side effects of collecting information.
  • Fixing performance bugs always starts with an accurate profile.
  • Even the most restricted communication channel can be enough to extract the information you need.
  • Suspect your own, ahead of third-party, code.

Chapter 9 – The Ideal Debugging Environment

  • Automate your tests, ensuring that they do the following:
    • Unambiguously pass or fail
    • Are self-contained
    • Can be executed with a single click
    • Provide comprehensive coverage
  • Use branches in source control sparingly.
  • Automate your build process:
    • Build and test the software every time it changes.
    • Integrate static analysis into every build.

Chapter 10 – Teach your Software to Debug Itself

  • Use assertions to do the following:
    • Both document and automatically validate your assumptions
    • Ensure that your software, although robust in production, is fragile during debugging
  • Create a debug build that
    • Is compiled with debug-friendly compiler options
    • Allows key subsystems to be replaced by debugging equivalents
    • Builds in control that will prove useful during diagnosis
  • Detect systemic problems, such as resource leaks and exception handling issues, preemptively.

Chapter 11 – Anti Patterns

  • Keep on top of your bug database to ensure that it accurately reflects your true priorities.
  • The polluter pays—don’t allow anyone to move onto a new task until they’ve completely finished their current one. If bugs come to light in their work, they fix them.
  • Make a single team responsible for a product from its initial concept through deployment and beyond.
  • Firefighting will never fix a quality problem. Take the time to identify and fix the root cause.
  • Avoid “big bang” rewrites.
  • Ensure that your code ownership strategy is clear.
  • Treat anything you don’t understand as a bug.

Debug It!: Find, Repair, and Prevent Bugs in Your CodeDebug It!: Find, Repair, and Prevent Bugs in Your Code by Paul Butcher
My rating: 3 of 5 stars

In the book the author covers different aspects of

Effective Debugging

In the first chapter the author advises1. Work out why the software is behaving unexpectedly.
2. Fix the problem.
3. Avoid breaking anything else.
4. Maintain or improve the overall quality (readability, architecture, test coverage, performance, and so on) of the code.
5. Ensure that the same problem does not occur elsewhere and cannot occur again.

The author emphasizes that without first understanding the true root cause of the bug, we are outside the realms of software engineering and delving instead into voodoo programming or programming by coincidence.

He suggests that empirical means is the best way to Debug i.e. provide different inputs and observe how the system behaves.

The Core Debugging Process involves the following steps:
1. Reproduce: Find a way to reliably and conveniently reproduce the problem on demand.
2. Diagnose: Construct hypotheses, and test them by performing experiments until you are confident that you have identified the underlying cause of the bug.
3. Fix: Design and implement changes that fix the problem, avoid introducing regressions, and maintain or improve the overall quality of the software.
4. Reflect: Learn the lessons of the bug. Where did things go wrong? Are there any other examples of the same problem that will also need fixing? What can you do to ensure that the same problem doesn’t happen again?

Address one Bug at a time: Picking too many bugs to address at one time will prevent focus on one.
Check Simple Things first: Somebody may have encountered something similar and may already have a solution.


1. Reproduction of the error should consistent and efficient, otherwise testing the fixing will become a botheration.
2. So reproduce the error in an controlled the environment to achieve consistency.
3. To keep it efficient try and reduce the input to be provided and reduce the processing that needs to be done, store the state at every step so that only the errorneous step needs to be rerun.
4. Automate the test conditions to make it quicker and easier to test the application after the fix. Replaying the log file can be a good strategy in scenarios where logging using proxy was used to capture the error condition.

The following will help in reproducing the error:
1. Logging at appropriate places so that one knows what is happening in the system. Too much logging will be unacceptable in a production system.
2. Where possible usage of a proxy to capture the network traffic and try to reproduce the error with this traffic.
3. If calls to libraries are problematic or they need to be emulated in a test environment, write a Shim (a proxy to a library) and capture the inputs and outputs and use this to reproduce the error. In engineering, a shim is a thin piece of material used to fill the space between objects. In computing we’ve borrowed the term to mean a small library that sits between a larger library and its client code. It can be used to convert one API to another or, as in the case we’re discussing here, to add a small amount of functionality without having to modify the main library itself.
4. Reach out the user community that is able to reproduce the error and get inputs from them. Give them specially instrumented code to figure out the error.
5. Read the documentation on the system, if the problem seems to be occurring beyond the realms of the code that has been written, and read the errors reported by others using the same platform.

Irreproducible Errors

Most bugs are reproducible. The few scenarios where the bug may be irreproducible or difficult to reproduce will be because of the following reasons:
1. Starting from an unpredictable initial state: C, C++ programs are prone to this error.
2. Interaction with external systems: This can happen if the other system is no running in lock-step with this software. If inputs from this external systems arrives when the current system is under different states the error can be difficult to reproduce.
3. Deliberate Randomness: In some systems there is deliberate randomness as in games. These can be difficult to debug. But if the same seed is used for the pseudo-random number generator then the bug will become easier to reproduce.
4. Multithreading – This happens because of the pre-emptive multi-tasking provide by the Operating System. Since the threads can be stalled and restarted at different times depending on the activity in the CPUs at that that time, it becomes difficult to reproduce errors in such an environment. Trying using the sleep to try and simulate the stalling of one thread and execution of another to try and emulate the error.

Good Practices of Reproducing

If a bug takes a long time and is still not identified this may be because another bug is masking this one. So try to concentrate on a different bug in the same area and possibly clear it before retrying the difficult one.


How to Diagnose?

1. Examine what you know about the software’s behaviour, and construct a hypothesis about what might cause it.
2. Design an experiment that will allow you to test its truth (or otherwise).
3. If the experiment disproves your hypothesis, come up with a new one, and start again.
4. If it supports your hypothesis, keep coming up with experiments until you have either disproved it or reached a high enough level of certainty to consider it proven.

Techniques of Diagnosing

1. Instrument the code to understand the flow better.
2. Use a binary search pattern and logging to locate the source code of error. I.e. look for error before and after the execution of a stretch of code. If error is found now look for the error in the first half of this code stretch, if not found then look for the error in the second half of the stretch; then further split the stretch found into further two halves; repeat this until the exact point of error is found.
3. Use a binary search pattern in version control to identify the version when error was introduced.
4. Use a binary search pattern on data to identify the version of the error.
5. Focus on the differences. The Application works for most customers, but not for specific ones. Check how these customers are different from the rest where the application is working. Similarly works in most environments, but does not work in a particular environment. Try and figure out what is different in that environment. If it happens for specific input files then figure out what is different in that file as compared to other files where it works.
6. Use debuggers when available.
7. Use the Interactive Consoles where debuggers are not available or are not good.

Good Practices of Diagnosing

1. When experimenting make only one change at a time.
2. Ignore nothing. Do not shrug off the unexpected as and anomaly. It could be that our assumptions are wrong.
3. Maintain a record of experiments and results so that it is easy to trace back.
4. Anything that you don’t understand is potentially a bug.
5. Learn from others. Search in the net for similar problem and solution offered.
6. All other things being equal, the simplest explanation is the best. – Occam’s Razor
7. Writing Automated Test Cases helps because this lets us concentrate only on broker cases.
8. Keep Asking “Are you changing the right thing?” If the changes you’re making have no effect, you’re not changing what you think you are.
9. Validate and revalidate your assumptions
10. Ensure that the underlying system on which diagnosis is being done is static and not changing.
11. If one is stuck in debugging a problem, one good way is to ask somebody else to take a look at it.


Best Practices

1. Make sure you know how you’re going to test it before designing your fix.
2. Do not let the fixes mess up with the original clean design and structure of code. Haphazardly put together fixes can mess up the good design principles followed in the original design. Any fix should leave the code in better shape than it was before.
3. Clean up any adhoc code changed before making the final fix so that no unwanted code gets checked in. Keep only what is absolutely necessary.
4. Use existing test cases. Modify the test cases if required or write the failing test case and test code without the fix. Then fix the code and test the failing test case to see that it passes after the fix.
1. Run the existing tests, and demonstrate that they pass.
2. Add one or more new tests, or fix the existing tests, to demonstrate the bug (in other words, to fail).
3. Fix the bug.
4. Demonstrate that your fix works (the failing tests no longer fail).
5. Demonstrate that you haven’t introduced any regressions (none of the tests that previously passed now fail).
5. Fix the Root Cause not the symptom. E.g. if one encounters a NullPointerException, the solution is not to capture the NullPointerException and handle or even worse suppress it, it is necessary to figure out why the NullPointerException is occurring and fixing that cause. Giving into temptation of quick fixes is not the right thing, making the right fix is the right thing.
6. Refactor or change functionality or fix a problem — one or the other, never more than one.
7. Always check in small changes. Do not check in large changes as it will make it very difficult to find out which change actually caused the problem. Ensure check-in comments are as meaningful (and specific) as possible.
8. Diff and check what exactly is being checked in before actually checking in.
9. Get the code reviewed. This is very important as unnoticed errors

After Fixing – Reflect

Sometimes “The six stages of debugging” reads as follows:
1. That can’t happen.
2. That doesn’t happen on my machine.
3. That shouldn’t happen.
4. Why is that happening?
5. Oh, I see.
6. How did that ever work?

After fixing one needs to reflect on the following points:
• How did it ever work?
• When and why did the problem slip through the cracks?
• How to ensure that the problem never happens again?

Find out the root cause. A useful trick when performing root cause analysis is to ask
“Why?” five times. For example:
• The software crashed. Why?
• The code didn’t handle network failure during data transmission. Why?
• There was no unit test to check for network failure. Why?
• The original developer wasn’t aware that he should create such a test. Why?
• None of our unit tests check for network failure. Why?
• We failed to take network failure into account in the original design.

After fixing do the following:
1. Take steps to ensure that it does not ever happen again. Educate yourself, educate others on the team.
2. Check if there are other similar errors.
3. Check if the documentation needs to be updated as a result of the fix.

Other aspects of handling and managing bugs

1. To better aid debugging collect relevant environment and configuration information automatically.
2. Detect bugs early, and do so from day one.
3. Poor quality is contagious. Broken Window concept. The theory was introduced in a 1982 article by social scientists James Q. Wilson and George L. Kelling. So do not leave bad code. Fix bad code at the earliest.
4. A Zero Bug Software is impossible, so take a pragmatic approach and try to reach as close to Zero bugs as possible. Temper perfectionism with pragmatism.
5. Keep the design simple. Not only does a simple design make your software easier to understand and less likely to contain bugs in the first place, it also makes it easier to control—which is particularly useful when trying to reproduce problems in concurrent software.
6. Automate your entire build process, from start to finish.
7. Version management of code is absolutely mandatory.
8. Different source should mean different version number. Even if the change to the code is minuscule.

Some Excerpts from the Book

View all my reviews