Sundarrajk's Weblog

Archive for the ‘Software’ Category

Your Code As a Crime Scene: Use Forensic Techniques to Arrest Defects, Bottlenecks, and Bad Design in Your ProgramsYour Code As a Crime Scene: Use Forensic Techniques to Arrest Defects, Bottlenecks, and Bad Design in Your Programs by Adam Tornhill
My rating: 4 of 5 stars

A very different way of looking at problems in the code. The author suggests various non-traditional means of identifying problems in software development.

Number of times a code has changed, code churn (number of lines added, removed), number of developers working on a single piece of code, code changing together.

The primary requirement for all of this that the version control should be adhered to correctly. Every developer should have a personal login Id and should use it to checkin at regular intervals.

Many of the techniques provided in the book are available in the form the tool from…. These are only starting tools. These need to be coupled with other tools like d3.js to get a good representation of the status of the code.

A must read for all software developers.

View all my reviews

How to Stop Sucking and Be Awesome InsteadHow to Stop Sucking and Be Awesome Instead by Jeff Atwood
My rating: 4 of 5 stars

Another nice book of blog entries from Jeff Atwood.

First few blogs are about how one should determine at a very early age if one can program or not and should drop out of a programming career if one is not. He speaks about “sheep that can program and goats that cannot program” should be separated out early in the career so that software can become better.

Some of the key observations that I liked are “You have to truly believe, as a company, and as peers, that crucial innovations and improvements can come from everyone at the company at any time, in bottom-up fashion – they aren’t delivered from on high at scheduled release intervals in the almighty master plan.

In another blog he speaks about how important it is to persuade others to do something. He refers to a set of dialog from the movie based on Idi Amin. Idi Amin is speaking to his trusted aide a Scottish Doctor.
Idi Amin: I want you to tell me what to do!
Garrigan: “You want me to tell you what to do?
Amin: Yes You are my advisor. You are the only one I can trust here. You should have told me not to throw the Asians out in the first place!
Garrigan: I did!
Amit: But you did not persuade me, Nicholas. You did not persuade me!

Not a very atypical dialog one is likely to have with either one’s manager or client. 🙂

Another important advice with which I cannot agree more since I have given the same advice to many others who have asked me for my opinion. “Whatever project you are working on, consider it an opportunity to learn and practice your craft. It is worth doing because, well it is worth doing. The journey of the project should be its own reward regardless of whatever happens to lie at the end of that journey.”
The corollary is another thing that I keep stressing on; “Never get attached to a project. Execute the project to the best of the abilities, learn along the way and if for some reason beyond your control the project fails or does not see the light of the day, so be it.”

Speaking on merit based growth in an organization Jeff has to say the following “Remove barriers that rob people in management and in engineering of their right to pride of workmanship. This means [among other things] abolishment of annual or merit rating and management by objectives. “Even people who think of themselves as Deming-ites have trouble with this one. They are left gasping. What the hell are we supposed to do instead? Deming’s point is that MBO and its ilk are copouts. By using simplistic extrinsic motivators to goad performance, managers excuse themselves from harder matters such as investment, direct personal motivation, thoughtful team formation, staff retention, and ongoing analysis and redesign of work procedures. Our point here is somewhat more limited: Any action that rewards team members differentially is likely to foster competition. Managers need to take steps to decrease or counteract this effect.

In one blog Jeff compares F-86 and MIG-15s. The latter was far more superior to the former, but the fighter pilots preferred the former. The difference was that F-86 had Hydraulic flight controller compared to the manual flight controller of MIG-15. This meant that each maneuver increased the fatigue of the MIG-15 pilot even though he might have out-maneuvered the F-86 pilot. The F-86 could maneuver quicker as compare to the MIG-15 as he was less fatigued and this tilted the pilots to favour F-86 despite its limited abilities. Jeff calls this the Boyd’s Law of Iteration which states “Speed of iteration beats quality of iteration”. Jeff argues the same is true for software development. Although he says in other places that quality cannot be sacrificed beyond a point.

There is a whole set of blogs in User Interface and Usability. One book the author highly recommends is Don’t make me think by Steve Krug… and another one is Rocket Surgery made easy again by Steve Krug….
He speaks about the Fitts law which states “Put all commonly accessed UI elements on the edges of screen. Because the cursor automatically stops at the edges, they will be easier to click on. Make clickable areas as large as you can. Larger targets are easier to click on”. One should not ignore the corollary of rule which would read “Make all the clicks that the user must be kept safe from as difficult as possible”. Jeff refers to this principle as the “seat ejector” button. This button should be easy to find in an emergency, but should not be place such that the pilot ends up turning this on instead of the navigation lights. Buttons like delete all my mails and such should be available, but should be placed such that the no user would click it by mistake.

Speaking on importance of saying not to demands, Jeff says “It is easy to dismiss Just say No as a negative mindset, but I think it is a healthy and natural reaction to observation that optimism is an occupational hazard of programming”. Cannot agree with him more as I have been pulled up for saying no many a time in my career.

Speaking about usability Jeff argues that most users of the applications do not progress beyond the intermediary stage. He argues that most move from the novice to Intermediary stage quite quickly or drop off as an user of the system if they find it too difficult to use. Once they reach this stage they remain in this stage for a long time and only a very few move to the expert level. Given this he argues that software should be targeted towards these users rather than at the novices or the experts. He states that most marketing people would advocate making software for the novice as these would be the ones that the marketing people would encounter most of the times, while the software developer would want to address the experts as it is likely that they would be geeks themselves and would want maximum flexibility and most features.

Writing on security and hacking the author says that today to hack into a site one needs social skills and not technical skills. Technology has advanced to a level where hacking into a website has been made difficult enough, but people are not inure to social engineering and that it is much easier way of hacking into a system.

On the whole a very good read.

View all my reviews

Eloquent JavaScriptEloquent JavaScript by Marijn Haverbeke
My rating: 3 of 5 stars

A good introductory book for beginners in JavaScript. Any beginner can to through the book and start writing real good JavaScript.

The author introduces modules, regex, DOM, the HTTP Protocol and also gives direction to write two different fun program with JavaScript.

Some of the chapters could be seen as outdated given the changes that have come into JavaScript, but nevertheless still a relevant book.

After this has been read one can read “JavaScript – The Good Parts” by Douglas Crockford.

View all my reviews

Release It!: Design and Deploy Production-Ready Software (Pragmatic Programmers)Release It!: Design and Deploy Production-Ready Software by Michael T. Nygard
My rating: 4 of 5 stars

The author asserts that software of today is built for passing the tests of the QA and not for the rigours of the Production environment. The author provides tips to design systems which will withstand the assaults it will have to face in the Production environments.
The author states that most decisions made upfront are the decisions that are the ones that impact the system the most, are most difficult to reverse or change and ironically these are the ones that are taken when the knowledge about the required system is minimal.
The author ironically states decrees such as “Use EJB container-managed persistence!”, “All UIs shall be constructed with JSF!”, “All that is, all that was and all that shall ever be lives in Oracle!” are given by architects in ivory towers.

In the stability section the author speaks about how to create and maintain stable systems in this section. The first example the author gives is of an airline company which had the following code:

. . .
public class FlightSearch implements SessionBean {
    private MonitoredDataSource connectionPool;
    public List lookupByCity(. . .) throws SQLException, RemoteException {
        Connection conn = null;
        Statement stmt = null;
        try {
            conn = connectionPool.getConnection();
            stmt = conn.createStatement();
            // Do the lookup logic
            // return a list of results
        } finally {
            if (stmt != null) {
            if (conn != null) {
Which looks and feels good. But if the stmt.close() ever throws an exception the conn.close() will never be called, resulting in connections leaking from the connection pool leading to all the connections in the pool being used up.

The author suggests that one should be prepared for as many points of breakages as possible. Tight coupling between systems leads to cascading failures. To avoid this there should be loose coupling between systems. As a corollary calls across systems should be asynchronous. This is not always possible and where possible complicates communication. So one needs to take a proper call on where to have asynchronous processing and where not to.

In chapter 4 the author discusses anti-patterns that lead to failures. The first anti-pattern is that all points of integration are fragile and can lead to failures. It is highlighted that most connections are based on TCP/IP. In TCP IP the first step is a three way handshake to setup the connection between the two systems that need to communicate. The first step is for the requestor to send a SYN packet. This has to be acknowledged by a SYN-ACK packet from the listener and finally the requestor sends a SYN packet to complete the three way handshake and establish the connection. If there is no listener then the failure is quick as the OS responds with RESET packet telling the requestor that its request does not have listener. This is a manageable situation. But if the listener is slow then the request will languish in the listen queue till it is timed out. The typical timeout is in minutes. This means that the requestor can wait for a long time before realising the problem.
The classic example of firewall killing the TCP connection between the application server and database server due to long time idle connection is quoted as an example.
The next example is about how it is difficult to timeout HTTP Connections in Java.
It is stressed again and again that it is better to be cynical than optimistic when developing software. Be prepared for the worst.
It is suggested that circuit breakers, i.e. stopping to retry a transaction after a particular number of failed attempts and/or maintaining the status of the underlying layer and deciding to not invoke the layer if there is a problem and timing out after waiting for a reasonable amount of time for the underlying layer to respond are two key mechanisms to avoid cascading problems from one layer to another.
The storing of large datasets in the session is highlighted as one of the more frequent ways of running out of memory. It is suggested that either the session be kept light or Softreference be used for storing large datasets to prevent out of memory errors.
The author rightly points out that the usage of synchronized keyword can be dangerous in a highly concurrent environment.
The advice is also to test third party libraries for breakability.
The author coins a new word called “Attack of Self Denial” where an event is published which leads to a flood of requests to the specified application. E.g. the news of a deep discount on a product for a retailer could be a cause for “Attack of Self Denial”. One needs to be prepared beforehand to handle such situations better.
A very good suggestion is that if it is not possible to build a shared nothing architecture then limit the number of systems sharing the resources. E.g. instead of sharing the sessions with all the application servers sharing it amongst two application servers so that the replication factor is limited.
One key point that the author brings about is most systems “treat the database with far too much trust.” and this is the major cause of problems in most systems. The author illustrates this with an example of how making an unbounded query resulted in continuous crashes at a retailer. The author suggests that always limit the results fetched from the database as a precaution.

The Stability Patterns
In this the author lists the patterns that will help the system be more stable.
1. Timeout: Use timeouts whenever interacting with a third party, especially when this involves some form of network, even though it may be within a LAN. It is a fail fast pattern to be used along with a circuit breaker, where if a few requests timeout then the resource is marked as down till it is found to be good again. The retry to check if the resource is good can be done at a regular interval suitable for the resource, i.e. delay the retry.
2. Circuit Breaker: Akin to the electrical circuit breaker a software circuit breaker prevents the entire software from collapsing under stress by stopping requests to the faulty interface. The users may see errors if this happens to be a crucial interface, but this is better than the user not being able to use the whole system. Typically if an interface frequently times out or fails frequently then the circuit breaker can mark this interface as broken for sometime. After a suitable amount of time it can retry the interface and if found functional it can close the circuit once again enabling the execution of the specific interface. All opening and closing of circuit breaker should be logged and made visible to the operations team so that they are aware of the change in the status.
3. Bulkheads: Bulkheads are compartments in a ship which prevent the ship from sinking if there is a damage to the hull. Each bulkhead stops the water from entering beyond it. Similarly use of multiple servers to deploy applications is one form of bulkhead. If the application is compartmentalized so that impact to one compartment does not impact the other is creating bulkheads in application. One example quoted is that of the airlines where ticketing system, flight status systems, flight search system, checkin system could all be deployed separately so that one does not interfere with the other.
Another example is if there are two systems which require the same service and if both the systems are critical, it makes sense to have separate setup of the common service for the two systems. Problem access of the common service in one system will not impact the service access of the other system.
Grouping pool of thread for specific purpose in a single process will ensure that problem in one thread pool does not prevent the process from servicing other types of requests.
The negative side of bulkheads is that it can make optimization of resource usage difficult. One would potentially have to provide more capacity than actually required.
4. Steady State: Maintaining a steady state of the systems is very important. Any kind of fiddling with the system for any reason can lead to instability. At the same time to maintain steady state some cleaning up is required. Log files will be generated by the applications and it is important to have a process that will keep removing the log files at the same rate or greater than the rate of generation. Similarly archival of records in a database is important to ensure that the queries on the database continue run consistently.
It is important to ensure that one has a finite, controlled number of entries in the in memory cache. Use an LRU or LFU mechanism to keep clearing the cache if it is expected to keep growing beyond known values.
5. Fail Fast: Quickly failing a request is very important to the health of the transaction. One should upfront have the statuses of all the external systems before beginning to process a transaction and if any of the external system is in a state which will mean that the transaction will fail then it is better to fail the transaction immediately. This will ensure that no compute power is wasted in processing doomed transactions.
6. Handshaking: It is important to have handshaking between any two systems so that the server process has the ability to state that it has its hands full and cannot respond and the client does not waste time trying to make a request which is going to take the server a long time to respond. This helps in failing fast.
7. Test Harness: A test harness should be able to emulate bizarre problems, like accepting a connection, but not sending any response, resetting the connection without ever accepting it and so on, not responding for a very very long time, send out large amounts of data as response. Testing the against such a test harness will help test how the system will behave under unexpected conditions.
8. Decoupling Middleware: A middleware typically helps shielding the requestor from the nitty-gritties of the server and also from the failures of the server. It helps decoupling two systems while integrating them.

The author very rightly concludes that “Sadly, the absence of a problem is not usually noted. You might be salvaging a badly botched implementation in which case you now have an opportunity to look like a hero. On the other hand, if you’ve done a great job of designing a stable system from the beginning, it’s unlikely that anyone will notice your system’s lack of downtime. That’s just the way it is. Deliver an unbreakable system, and users will surely go on to
complain about something else. That’s just what users do. In fact, with a system that never goes down, the users will most likely complain that it’s slow. Next, you’ll look at capacity and performance and how to get the most out of your resources.”

In a case study it is illustrated how usage of sessions killed the application. The bots and the regular users increased the number of sessions far beyond what the system could handle and site crashed. This was later resolved by supporting session through URL rewriting so that no new sessions are created by the bots and also by creating a throttling mechanism to control the total number of sessions in the system. The key learning is that the performance test only tested for happy paths and never for situations like bots hitting the site.
When planning for capacity it is important to ensure that the software written is optimal and has minimum wastage. If this is not done it would lead to increasing costs of resources required to run the application. As an example if an HTML page has 1K of junk data, this will translate into 1GB of extra bandwidth usage if there are a million requests to this page. The cost of resources multiplies as the usage of the application increases.

Some good patterns to follow are:
1. Pool resources, size them properly and monitor them.
2. Use caching, limit the maximum memory that can be used by the cached objects and monitor the hit ratio.
3. Precompute whatever is possible and recompute only when absolutely necessary.
4. Tune Garbage Collection

Some Network points
1. Servers in production tend to be multi-homed and it is important to bind the applications to the right home to prevent security issues.
2. Given the above scenario it becomes important to correctly make the network routing scenarios.
3. Use Virtual IPs where native clustering of applications is not possible. Applications need to be written keeping in mind that this will be the case in production systems.

Some Security aspects:
1. Follow the principle of “least privilege”. This states that every action should done with the least privilege required to execute the action. Rnu each application with its own user so if one application is compromised it is only that application and none of the others.
2. Ensure that the passwords use to access other services are secured properly. Ensure that the memory dumps of the processes will not reveal the passwords. Keep the passwords away from the installation directory.

Some Availability Aspects
1. The cost of the a system grows exponentially with the required availability. Availability should be defined realistically, not idealistically.
2. The SLAs should be well defined and measurable. SLAs should be defined by features and dependent on 3rd party SLAs available. The location from where the application is accessed also matters.
3. Load Balancing and Reverse Proxies should be used to balance the load across the multiple servers and across the various tiers.
4. Clustering will be required in scenarios where the servers need to communicate with each other to exchange some data.

To ensure reliability of the system the topology of the QA environment should be same as that of the Production although the capacity may be far lower.
Configuration of the application and environment related configuration should be separated out.
Application should be able to announce if it has not started properly.
Provide command line options to configure the systems. GUI can be used when sufficient time is at hand and automation is not required.

Every system needs to be transparent, i.e. it needs to show what it is using and what it is doing. Without this information it is very difficult to manage the system. While it is necessary to know the status of the individual parts, it is important to also know the status across all the parts of the system. This helps in analysing any problem that is manifesting in the system.
It is not necessary to log the stack trace of a business exception like a validation error which states a mandatory parameter was not entered. It is vital to log the stack trace in case a non business exception occurred.
It is important to have a network separate from the production data network for monitoring traffic.
A good monitoring system provide visibility to to business outcome and not just technical parameters.

A very good comparison between crystals and tight coupling in software design.
“A cluster of objects that can exist together only in a tight collaboration resembles a crystal in a metal. The objects stay together in a tightly bound relationship, just as the atoms in a crystal are tightly bound. In metal, small crystals mean greater malleability. More malleable metals recover from stress better. Large crystals encourage crack formation. In software, large “crystals” make it harder to change the software. When objects in one grain participate in multiple collaboration patterns, they bridge two crystals, forming a larger grained crystal—further reducing the malleability of the software.
There is no limit to how far this region of tightly bound crystals can spread. In the extreme case, the crystal grows until it is the boundary of the application. When that happens, every object suits exactly one purpose to which it is supremely adapted. It fits perfectly into place and ultimately relates to every other object. These crystal palaces might even be beautiful in a baroque sort of way. They admit no improvement, in part, because no incremental change is possible and, in part, because nothing can be moved without moving every other object. These tend to be dead structures. Developers tiptoe through crystal palaces, speaking in hushed tones and trying not to touch anything.”

View all my reviews

Ship It!Ship It! by Jared Richardson
My rating: 3 of 5 stars

A collection of lessons learned by various developers in the trenches. The book starts off with a quote of Aristotle “We are what we repeatedly do. Excellence, then, is not an act, but a habit.”. The book strengthens this argument by stating “Extraordinary products are merely side effects of good habits.”. So the first tip of the book is “Choose your habits”. Do not follow something just because it is popular or well known or is practised by others around you.

The author says that there are three aspects that one needs to pay attention to:

  1. Techniques: How the project is developed? I.e. Daily meetings, Code Reviews, Maintaining a To Do List etc.
  2. Infrastructure: Tools used to develop the project. I.e. Version Control, Build Scripts, Running Tests, Continuous Build etc.
  3. Process: The process followed in developing the applications. Propose Objects, Propose Interfaces, Connection Interfaces, Add Functions, Refactor Refine Repeat.

Tools and Infrastructure

The author highlights the need for a proper tool for Source Control Management. The author also issues a warning that the right tool should be chosen. A tool should not be chosen because it is backed by a big ticket organization. Vendors would push for “supertools”, but one needs to exercise discretion when choosing between the tools.

Good Development Practices

  1. Develop in a Sandbox, i.e. changes of one developer should not impact the other until the changes are ready.
  2. Each developer should have a copy of everything they need for development, this includes web server, application server, database server, most importantly source code and anything else.
  3. Once all the changes by the developer are finished they should check it in to the Source Control so that the others can pick up and integrate it with their code and make any changes they need to make to integrate.
  4. The checked in changes should be fine grained.

Tools Required for ensuring Good Development Practices

  1. SCM
  2. Build Scripts
  3. Track Issues

What to keep in SCM?

  1. While it can be debated whether runtimes like Java need to be kept in the SCM, it is important that all the third party libraries (jars, dlls) and configuration templates be available in the SCM. Note that configuration templates need to be available as the contents itself can change from environment to environment.
  2. Anything that is generated as part of the build process (jars, dlls, exes, war) should not be stored in the SCM.

What a Good SCM should offer

  1. Ensure that the usage of SCM is painless to the developers. The interactions with the SCM should be fast enough to ensure that the developers do not hesitate to use it.
  2. A minimal set of activities that should be supported by the SCM are
  • Check out the entire project.
  • Look at the differences between your edits and the latest code in the SCM.
  • View the history for a specific file—who changed this file and when did they do it?
  • Update your local copy with other developers’ changes.
  • Push (or commit) your changes to the SCM.
  • Remove (or back out) the last changes you pushed into the SCM.
  • Retrieve a copy of the code tree as it existed last Tuesday.

Script the Build

Once the required artefacts are checked out from the SCM it should be possible for any developer to run a script and have a working system (sandbox) of her own to work on. For this one needs a Build Script. This should be a completely automated build requiring no manual intervention or steps. This build script should be outside of the IDE so that it can be used irrespective of the IDE being used. The IDE could use the same script for local builds.
Once the one step/command build script is ready, automate the build. Ideally everytime a code is checked in the following should be done.

  1. Checkout the latest code and build
  2. Run a set of smoke tests to ensure that the basic functionality is not broken.
  3. Configure the build system to notify the stakeholders of new code checked, the build and the test results.

This is Continuous Integration

Tracking the Issues

It is important to track the issues that are reported for the application so that they can be tracked and fixed.
At a bare minimum one needs to know the following about an issue:

  • What version of the product has the issue?
  • Which customer encountered the issue?
  • How severe is it?
  • Was the problem reproduced in-house (and by whom, so they can help you if you’re unable to reproduce the problem)?
  • What was the customer’s environment (operating system, database, etc.)?
  • In what version of your product did the issue first occur?
  • In what version of your product was it fixed?
  • Who fixed it?
  • Who verified the fix?

Some more that will help in the long term

  • During what phase of the project was the bug introduced?
  • The root cause of the bug
  • The sources that were changed to fix the problem. If the checkin policy demands that the checkin comment indicate the reason for the fixes, then it should be possible to correlate the checkin with the issue that they fixed or requirement that they addressed.
  • How long did it take to fix the error? (Time to analyze, Fix, Test)

Some warning signs that things are not OK with the issue system

  • The system isn’t being used.
  • Too many small issues have been logged in the system
  • Issue-related metrics are used to evaluate team member performance.

Tracking Features

Just as it is important to track the issues, it is important to track the features that have been planned for the application.
The system used to track issues may also be used to track the features as long as it provides the ability to identify them separately.

Test Harness

Have a good Test Harness which can be used to run automated tests on the system.

  1. Use a standard Test Harness which can generate all the required reports.
  2. Ensure that every team member uses the same tool.
  3. Ensure that the tool can be run from the command line. This will enable driving it from an external script or a tool.
  4. Ensure that the tool is flexible to test multiple types of applications and not specific to a particular type.

Different types of testing needs to be planned for

  1. Unit Testing – Testing small pieces of code. This forces the developers to break up the code into smaller pieces. This makes is easier to maintain and understand, reduces copy paste, ensures that overall functionality is, if at all, minimally impacted by refactoring.
  2. Functional Testing – Testing all the functions of the application.
  3. Performance Testing – Testing the application to ensure that the application is performing within acceptable limits and meets the SLAs.
  4. Load Testing – This is similar to the Performance Testing. The goal of this is to ensure that the application does not collapse under load.
  5. Smoke Testing – This is a light-weight testing which will test the key functionality of the application. This should be included as part of Continuous Integration so that any breakage in key functionality comes to light very quickly.
  6. Integration Testing – This ensures that the integration of the modules within the application and the integration of the application with the external systems is functioning correctly.
  7. Mock Client Testing – This mocks the client requests and ensures that the client get the right response and within the expected time period.

Pragmatic Project Techniques

Some of the good practices to follow when working in projects are as follows:

  1. Maintaining a list of activities to do. This should be visible and accessible to everybody on the project. Even the client should have visibility to the list so that they are check the speed and prioritize the items in the list. Each item should have a target time. The list should reflect the current status and should not be out of date.
  2. Having Tech Leads in the project is important. The Tech lead should guide the team in the selection and utilization of the technology. Tech lead should be responsible to ensure that the deadlines are realistic. The Tech lead should act as the bridge between the developers and the management. It is an important role to be played by a person with the right temperament.
  3. Coordinating and Communicating on daily basis is very important. Meetings need to be setup on a daily basis. These meetings should be short and to the point, with everybody sharing details of what they are doing and what they plan to do. Team should highlight any problem they are facing. The solutions for these problems should not be part of this meeting, but should happen separately.
  4. Code review is a very crucial part of the project and every piece of code should be reviewed. Some good practices of code review are
    1. Review only a small amount of code at any time
    2. A code should not be reviewed by more than two people
    3. Code should be reviewed frequently, possibly several times a day
    4. Consider pair programming as a continuous code review process.

Tracer Bullet Development

Just like it is possible to fire a Tracer Bullet in the night to track the path before aiming the real bullet, it should be possible to predict the path of the project using the process opted for.


Have a process to follow.
The process followed should not claim exclusivity in success of projects. If it does so, then suspect it.
Follow a process that embraces periodic reevaluation and inclusion of whatever practices work well for the projects.


  • Define the layers that will exist in the application.
  • Define the interfaces between the layers.
  • Let each layer be developed by a separate team, relying on the interface promised by the adjacent layers.
  • Keep it flexible so that the interface can be changed as it is hard to get the interfaces perfect the first time around.
  • First create the large classes like the Database Connection Manager, Log Manager etc required for each layer, then write the fine grained classes.
  • Collaboration between the teams developing the different layers is key to the success. These collaborations will Trace the Path that the project will take.
  • Do not let an architect sitting in an ivory tower dictate the architecture.
  • It is dangerous to have one person driving the whole project. If this person leaves, the project will come to a standstill.
  • Create stubs, or mock the interfaces of the adjacent layers so that it becomes easy to test.
  • Code the tough and key pieces first and test them before addressing the simpler ones. It may take time to show progress, but when the progress happens it will be very quick.

Common Problems and How to fix Them

What to do when legacy code is inherited?

  1. Build it – Learn to build it and script the build.
  2. Automate it – Automate the build.
  3. Test it – Test to understand what the system does and write automated test cases.

Don’t change legacy code unless you can test it.

Some other tips from the chapter

  1. If a code is found unsuitable for automated test, then refactor the code slowly so that it becomes amenable to automated testing.
  2. If a project keeps breaking repeatedly, automated test cases, emulating the user actions will help reduce the incidents.
  3. Ensure that the automated tests are updated with change in code/logic whenever required, otherwise these would become useless.
  4. It is important to have a Continuous Intergration so that the automated tests can be run regularly.
  5. Early checkins (in fact daily or more than once a day) and quick updates by the developers is important so that the integration problems are detected as early as possible.
  6. It is important to communicating with the customers and getting regular feedback.
  7. Best way to show the customer the progress of the project is to show them a working demo of the application.
  8. Introduce a process change when the team is not under pressure. Point out the benefit the stakeholders will have with the new process. Show them the benefit of the process/practice rather than talk and preach about it.

A wonderful Dilbert quote from the book
“I love deadlines. I especially love the swooshing sound they make as they go flying by.” — Scott Adams

Some Excerpts from the book

View all my reviews

When I started programming,
I just talked to the customer,
And coded and everyone was rocking,
Nothing was only bluster.

Then they said we need to do waterfall
Somebody talked to the customer,
Someone else coded leading to downfall,
And the project manager went a fluster.

Now they are saying use Agile,
Everybody talking to the customer,
It is making everything fragile,
And the project manger has lost his luster.

These are excerpts from the book. These are the summary from end of each of the chapter.

Chapter 1 – A Method in Madness

  • Make sure to do the following:
    • Work out why the software is behaving unexpectedly.
    • Fix the problem.
    • Avoid breaking anything else.
    • Maintain or improve overall quality.
    • Ensure that the same problem does not occur elsewhere and cannot occur again.
  • Leverage your software’s ability to show you what’s happening.
  • Work on only one problem at a time.
  • Make sure that you know exactly what you’re looking for:
    • What is happening?
    • What should be happening?
  • • Check simple things first.

Chapter 2 – Reproduce

  • Find a reproduction before doing anything else.
  • Ensure that you’re running the same version as the bug was reported against.
  • Duplicate the environment that the bug was reported in.
  • Determine the input necessary to reproduce the bug by:
    • Inference
    • Recording appropriate inputs via logging
  • Ensure that your reproduction is both reliable and convenient through iterative refinement:
    • Reduce the number of steps, amount of data, or time required.
    • Remove nondeterminism
    • Automate.

Chapter 3 – Diagnose

  • Construct hypotheses, and test them with experiments.
    • Make sure you understand what your experiments are going to tell you.
    • Make only one change at a time.
    • Keep a record of what you’ve tried.
    • Ignore nothing.
  • When things aren’t going well:
    • If the changes you’re making don’t seem to be having an effect, you’re not changing what you think you are.
    • Validate your assumptions.
    • Are you facing multiple interacting causes or a changing underlying system?
  • Validate your diagnosis.

Chapter 4 – Fix

  • Bug fixing involves three goals:
    • Fix the problem.
    • Avoid introducing regressions.
    • Maintain or improve overall quality (readability, architecture, test coverage, and so on) of the code.
  • Start from a clean source tree.
  • Ensure that the tests pass before making any changes.
  • Work out how you’re going to test your fix before making changes.
  • Fix the cause, not the symptoms.
  • Refactor, but never at the same time as modifying functionality.
  • One logical change, one check-in.

Chapter 5 – Reflect

“The six stages of debugging” and reads as follows:

  1. That can’t happen.
  2. That doesn’t happen on my machine.
  3. That shouldn’t happen.
  4. Why is that happening?
  5. Oh, I see.
  6. How did that ever work?

  • Take the time to perform a root cause analysis:
    • At what point in your process did the error arise?
    • What went wrong?
  • Ensure that the same problem can’t happen again:
    • Automatically check for problems.
    • Refactor code to remove the opportunity for incorrect usage.
    • Talk to your colleagues, and modify your process if appropriate.
  • Close the loop with other stakeholders.

Chapter 6 – Discovering that you have a problem

  • Make the most of your bug-tracking system:
    • Pick one at an appropriate level of complexity for your particular situation.
    • Make it directly available to your users.
    • Automate environment and configuration reporting to ensure accurate reports.
  • Aim for bug reports that are the following:
    • Specific
    • Unambiguous
    • Detailed
    • Minimal
    • Unique
  • When working with users, do the following:
    • Streamline the bug-reporting process as much as possible.
    • Communication is key—be patient and imagine yourself in the user’s shoes.
  • Foster a good relationship with customer support and QA so you can leverage their support during bug fixing.

Chapter 7 – Pragmatic Zero Tolerance

  • Detect bugs as early as possible, and fix them as soon as they come to light.
  • Act as though bug-free software was an attainable goal, but temper perfectionism with pragmatism.
  • If you find yourself faced with a poor quality codebase, do the following:
    • Recognize there is no silver bullet.
    • Make sure that the basics are in place first.
    • Separate clean code from unclean, and keep it clean.
    • Use bug triage to keep on top of your bug database.
    • Incrementally clean up bad code by adding tests and refactoring.

Chapter 8 – Special Cases

  • When patching an existing release, concentrate on reducing risk.
  • Keep on the lookout for compatibility implications when fixing bugs.
  • Ensure that you have completely closed any timing windows, not just decreased their size.
  • When faced with a heisenbug, minimize the side effects of collecting information.
  • Fixing performance bugs always starts with an accurate profile.
  • Even the most restricted communication channel can be enough to extract the information you need.
  • Suspect your own, ahead of third-party, code.

Chapter 9 – The Ideal Debugging Environment

  • Automate your tests, ensuring that they do the following:
    • Unambiguously pass or fail
    • Are self-contained
    • Can be executed with a single click
    • Provide comprehensive coverage
  • Use branches in source control sparingly.
  • Automate your build process:
    • Build and test the software every time it changes.
    • Integrate static analysis into every build.

Chapter 10 – Teach your Software to Debug Itself

  • Use assertions to do the following:
    • Both document and automatically validate your assumptions
    • Ensure that your software, although robust in production, is fragile during debugging
  • Create a debug build that
    • Is compiled with debug-friendly compiler options
    • Allows key subsystems to be replaced by debugging equivalents
    • Builds in control that will prove useful during diagnosis
  • Detect systemic problems, such as resource leaks and exception handling issues, preemptively.

Chapter 11 – Anti Patterns

  • Keep on top of your bug database to ensure that it accurately reflects your true priorities.
  • The polluter pays—don’t allow anyone to move onto a new task until they’ve completely finished their current one. If bugs come to light in their work, they fix them.
  • Make a single team responsible for a product from its initial concept through deployment and beyond.
  • Firefighting will never fix a quality problem. Take the time to identify and fix the root cause.
  • Avoid “big bang” rewrites.
  • Ensure that your code ownership strategy is clear.
  • Treat anything you don’t understand as a bug.

Debug It!: Find, Repair, and Prevent Bugs in Your CodeDebug It!: Find, Repair, and Prevent Bugs in Your Code by Paul Butcher
My rating: 3 of 5 stars

In the book the author covers different aspects of

Effective Debugging

In the first chapter the author advises1. Work out why the software is behaving unexpectedly.
2. Fix the problem.
3. Avoid breaking anything else.
4. Maintain or improve the overall quality (readability, architecture, test coverage, performance, and so on) of the code.
5. Ensure that the same problem does not occur elsewhere and cannot occur again.

The author emphasizes that without first understanding the true root cause of the bug, we are outside the realms of software engineering and delving instead into voodoo programming or programming by coincidence.

He suggests that empirical means is the best way to Debug i.e. provide different inputs and observe how the system behaves.

The Core Debugging Process involves the following steps:
1. Reproduce: Find a way to reliably and conveniently reproduce the problem on demand.
2. Diagnose: Construct hypotheses, and test them by performing experiments until you are confident that you have identified the underlying cause of the bug.
3. Fix: Design and implement changes that fix the problem, avoid introducing regressions, and maintain or improve the overall quality of the software.
4. Reflect: Learn the lessons of the bug. Where did things go wrong? Are there any other examples of the same problem that will also need fixing? What can you do to ensure that the same problem doesn’t happen again?

Address one Bug at a time: Picking too many bugs to address at one time will prevent focus on one.
Check Simple Things first: Somebody may have encountered something similar and may already have a solution.


1. Reproduction of the error should consistent and efficient, otherwise testing the fixing will become a botheration.
2. So reproduce the error in an controlled the environment to achieve consistency.
3. To keep it efficient try and reduce the input to be provided and reduce the processing that needs to be done, store the state at every step so that only the errorneous step needs to be rerun.
4. Automate the test conditions to make it quicker and easier to test the application after the fix. Replaying the log file can be a good strategy in scenarios where logging using proxy was used to capture the error condition.

The following will help in reproducing the error:
1. Logging at appropriate places so that one knows what is happening in the system. Too much logging will be unacceptable in a production system.
2. Where possible usage of a proxy to capture the network traffic and try to reproduce the error with this traffic.
3. If calls to libraries are problematic or they need to be emulated in a test environment, write a Shim (a proxy to a library) and capture the inputs and outputs and use this to reproduce the error. In engineering, a shim is a thin piece of material used to fill the space between objects. In computing we’ve borrowed the term to mean a small library that sits between a larger library and its client code. It can be used to convert one API to another or, as in the case we’re discussing here, to add a small amount of functionality without having to modify the main library itself.
4. Reach out the user community that is able to reproduce the error and get inputs from them. Give them specially instrumented code to figure out the error.
5. Read the documentation on the system, if the problem seems to be occurring beyond the realms of the code that has been written, and read the errors reported by others using the same platform.

Irreproducible Errors

Most bugs are reproducible. The few scenarios where the bug may be irreproducible or difficult to reproduce will be because of the following reasons:
1. Starting from an unpredictable initial state: C, C++ programs are prone to this error.
2. Interaction with external systems: This can happen if the other system is no running in lock-step with this software. If inputs from this external systems arrives when the current system is under different states the error can be difficult to reproduce.
3. Deliberate Randomness: In some systems there is deliberate randomness as in games. These can be difficult to debug. But if the same seed is used for the pseudo-random number generator then the bug will become easier to reproduce.
4. Multithreading – This happens because of the pre-emptive multi-tasking provide by the Operating System. Since the threads can be stalled and restarted at different times depending on the activity in the CPUs at that that time, it becomes difficult to reproduce errors in such an environment. Trying using the sleep to try and simulate the stalling of one thread and execution of another to try and emulate the error.

Good Practices of Reproducing

If a bug takes a long time and is still not identified this may be because another bug is masking this one. So try to concentrate on a different bug in the same area and possibly clear it before retrying the difficult one.


How to Diagnose?

1. Examine what you know about the software’s behaviour, and construct a hypothesis about what might cause it.
2. Design an experiment that will allow you to test its truth (or otherwise).
3. If the experiment disproves your hypothesis, come up with a new one, and start again.
4. If it supports your hypothesis, keep coming up with experiments until you have either disproved it or reached a high enough level of certainty to consider it proven.

Techniques of Diagnosing

1. Instrument the code to understand the flow better.
2. Use a binary search pattern and logging to locate the source code of error. I.e. look for error before and after the execution of a stretch of code. If error is found now look for the error in the first half of this code stretch, if not found then look for the error in the second half of the stretch; then further split the stretch found into further two halves; repeat this until the exact point of error is found.
3. Use a binary search pattern in version control to identify the version when error was introduced.
4. Use a binary search pattern on data to identify the version of the error.
5. Focus on the differences. The Application works for most customers, but not for specific ones. Check how these customers are different from the rest where the application is working. Similarly works in most environments, but does not work in a particular environment. Try and figure out what is different in that environment. If it happens for specific input files then figure out what is different in that file as compared to other files where it works.
6. Use debuggers when available.
7. Use the Interactive Consoles where debuggers are not available or are not good.

Good Practices of Diagnosing

1. When experimenting make only one change at a time.
2. Ignore nothing. Do not shrug off the unexpected as and anomaly. It could be that our assumptions are wrong.
3. Maintain a record of experiments and results so that it is easy to trace back.
4. Anything that you don’t understand is potentially a bug.
5. Learn from others. Search in the net for similar problem and solution offered.
6. All other things being equal, the simplest explanation is the best. – Occam’s Razor
7. Writing Automated Test Cases helps because this lets us concentrate only on broker cases.
8. Keep Asking “Are you changing the right thing?” If the changes you’re making have no effect, you’re not changing what you think you are.
9. Validate and revalidate your assumptions
10. Ensure that the underlying system on which diagnosis is being done is static and not changing.
11. If one is stuck in debugging a problem, one good way is to ask somebody else to take a look at it.


Best Practices

1. Make sure you know how you’re going to test it before designing your fix.
2. Do not let the fixes mess up with the original clean design and structure of code. Haphazardly put together fixes can mess up the good design principles followed in the original design. Any fix should leave the code in better shape than it was before.
3. Clean up any adhoc code changed before making the final fix so that no unwanted code gets checked in. Keep only what is absolutely necessary.
4. Use existing test cases. Modify the test cases if required or write the failing test case and test code without the fix. Then fix the code and test the failing test case to see that it passes after the fix.
1. Run the existing tests, and demonstrate that they pass.
2. Add one or more new tests, or fix the existing tests, to demonstrate the bug (in other words, to fail).
3. Fix the bug.
4. Demonstrate that your fix works (the failing tests no longer fail).
5. Demonstrate that you haven’t introduced any regressions (none of the tests that previously passed now fail).
5. Fix the Root Cause not the symptom. E.g. if one encounters a NullPointerException, the solution is not to capture the NullPointerException and handle or even worse suppress it, it is necessary to figure out why the NullPointerException is occurring and fixing that cause. Giving into temptation of quick fixes is not the right thing, making the right fix is the right thing.
6. Refactor or change functionality or fix a problem — one or the other, never more than one.
7. Always check in small changes. Do not check in large changes as it will make it very difficult to find out which change actually caused the problem. Ensure check-in comments are as meaningful (and specific) as possible.
8. Diff and check what exactly is being checked in before actually checking in.
9. Get the code reviewed. This is very important as unnoticed errors

After Fixing – Reflect

Sometimes “The six stages of debugging” reads as follows:
1. That can’t happen.
2. That doesn’t happen on my machine.
3. That shouldn’t happen.
4. Why is that happening?
5. Oh, I see.
6. How did that ever work?

After fixing one needs to reflect on the following points:
• How did it ever work?
• When and why did the problem slip through the cracks?
• How to ensure that the problem never happens again?

Find out the root cause. A useful trick when performing root cause analysis is to ask
“Why?” five times. For example:
• The software crashed. Why?
• The code didn’t handle network failure during data transmission. Why?
• There was no unit test to check for network failure. Why?
• The original developer wasn’t aware that he should create such a test. Why?
• None of our unit tests check for network failure. Why?
• We failed to take network failure into account in the original design.

After fixing do the following:
1. Take steps to ensure that it does not ever happen again. Educate yourself, educate others on the team.
2. Check if there are other similar errors.
3. Check if the documentation needs to be updated as a result of the fix.

Other aspects of handling and managing bugs

1. To better aid debugging collect relevant environment and configuration information automatically.
2. Detect bugs early, and do so from day one.
3. Poor quality is contagious. Broken Window concept. The theory was introduced in a 1982 article by social scientists James Q. Wilson and George L. Kelling. So do not leave bad code. Fix bad code at the earliest.
4. A Zero Bug Software is impossible, so take a pragmatic approach and try to reach as close to Zero bugs as possible. Temper perfectionism with pragmatism.
5. Keep the design simple. Not only does a simple design make your software easier to understand and less likely to contain bugs in the first place, it also makes it easier to control—which is particularly useful when trying to reproduce problems in concurrent software.
6. Automate your entire build process, from start to finish.
7. Version management of code is absolutely mandatory.
8. Different source should mean different version number. Even if the change to the code is minuscule.

Some Excerpts from the Book

View all my reviews

A few weeks ago a blog was published asking a rhetoric question “Can software be created in factories?“. My good friend pointed me to the wikipedia post on “Software Factory“. What I would like to point out is that the statement “Software factory refers to a structured collection of related software assets that aids in producing computer software applications or software components according to specific, externally defined end-user requirements through an assembly process. [1] A software factory applies manufacturing techniques and principles to software development to mimic the benefits of traditional manufacturing. Software factories are generally involved with outsourced software creation.” from the wikipedia is completely incorrect and fallacious, despite the number of individuals who believe it.

Martin Fowler’s bliki, Code as Documentation, has a link to Jack Reeve’s famous essay “What is Software Design?”.

This article first appeared in 1992 in the C++ Journal. It was written by Jack Reeve who had been in the industry for more than 10 years at the time and the trigger was the fact that C++ had taking the software world by storm. It was being seen as the panacea for all the problems plaguing the software industry during that time.

He summarizes the article as follows:

To summarize:

  • Real software runs on computers. It is a sequence of ones and zeros that is stored on some magnetic media. It is not a program listing in C++ (or any other programming language).
  • A program listing is a document that represents a software design. Compilers and linkers actually build software designs.
  • Real software is incredibly cheap to build, and getting cheaper all the time as computers get faster.
  • Real software is incredibly expensive to design. This is true because software is incredibly complex and because practically all the steps of a software project are part of the design process.
  • Programming is a design activity—a good software design process recognizes this and does not hesitate to code when coding makes sense.
  • Coding actually makes sense more often than believed. Often the process of rendering the design in code will reveal oversights and the need for additional design effort. The earlier this occurs, the better the design will be.
  • Since software is so cheap to build, formal engineering validation methods are not of much use in real world software development. It is easier and cheaper to just build the design and test it than to try to prove it.
  • Testing and debugging are design activities—they are the software equivalent of the design validation and refinement processes of other engineering disciplines. A good software design process recognizes this and does not try to short change the steps.
  • There are other design activities—call them top level design, module design, structural design, architectural design, or whatever. A good software design process recognizes this and deliberately includes the steps.
  • All design activities interact. A good software design process recognizes this and allows the design to change, sometimes radically, as various design steps reveal the need.
  • Many different software design notations are potentially useful—as auxiliary documentation and as tools to help facilitate the design process. They are not a software design.
  • Software development is still more a craft than an engineering discipline. This is primarily because of a lack of rigor in the critical processes of validating and improving a design.
  • Ultimately, real advances in software development depend upon advances in programming techniques, which in turn mean advances in programming languages. C++ is such an advance. It has exploded in popularity because it is a mainstream programming language that directly supports better software design.
  • C++ is a step in the right direction, but still more advances are needed.

The points to note with respect to the factory aspect of software are highlighted in red. Note that the author states the coding is design and one cannot dispute this fact. An since one does not design in a factory software development cannot be considered to happen in a factory. It may look like splitting hairs, but for somebody who is coding, be it a novice who has started yesterday, or be it somebody who has been doing it for donkey’s years it is apparent that this is indeed a fact. One does keep designing practically with every line of code.

Another interesting excerpt is “In software engineering, we desperately need good design at all levels. In particular, we need good top level design. The better the early design, the easier detailed design will be. Designers should use anything that helps. Structure charts, Booch diagrams, state tables, PDL, etc.—if it helps, then use it.”

This was the statement of the author in the essay published in 1992. Writing about this in 2005 the author says “Today, I would phrase it differently. I would say we need good architectures (top level design), good abstractions (class design), and good implementations (low level design). I would also say something about using UML diagrams or CRC cards to explore alternatives.”
This is what the author is referring to from the earlier article: “We must keep in mind, however, that these tools and notations are not a software design. Eventually, we have to create the real software design, and it will be in some programming language. Therefore, we should not be afraid to code our designs as we derive them.”

The author goes on to say “This is fundamental. I am not arguing that we should not “do design.” However you want to approach the process, I simply insist that you have not completed the process until you have written and tested the code.”

Note that the author bolsters the argument that software development involves design at all stage. It is not limited to a single design phase.

Another interesting statement in the second essay is “When the document is detailed enough, complete enough, and unambiguous enough that it can be interpreted mechanistically, whether by a computer or by an assembly line worker, then you have a design document. If it still requires creative human interpretation, then you don’t.”. Again goes on to prove that software cannot be created in factories.
One final argument to support that Software cannot be created in factory “The problem with software is – design is not just important, it is basically everything. Saying that programmers should not have to design is like saying fish should not have to swim. When I am coding, I am designing. I am creating a software design out of the void.”

When this was sent to a few people the reply I got back is “In the Indian IT industry, there is no such thing as a “Less Able Programmer”. All donkeys can be “processed” to become a stallion. all crows can become swans…” and all that can be said about this sad fact is that “This belief is exactly the bane of the Indian IT industry and in my, black, cynical, negative opinion is going to lead to the downfall of the what we today consider to be a cash cow.”

In my interactions with various personnel working in the IT world I have noticed that some people have the tendency to use the word “factory” to describe location where people are either writing new applications or are maintaining existing software. Something in the word “factory” raises an irritation in me. I do not get a comfortable feeling when somebody equates to software development/maintenance to the tasks performed in a factory.

What, I think, these people fail to realize or admit is that in factory the tasks tend to be repetitive and hence “teachable” and “learnable”. This is the reason why we see so much automation in all the factories, they hardly have any human intervention.

Unlike the manufacturing factories the software “factory” is full of people. Except for very few processes in the development cycle, software creation cannot be automated. Human intervention is required at almost every stage. Software requires human touch during creation.

From a maintenance and support perspective too software needs humans to address any issues that come up in the production. Very little of this can be automated.

Given these it gives me creeps if somebody refers to “software factory”.