Almost every project I have worked on over the last 6 years has incorporated a code coverage analysis tool into the build. Most of them have failed the build if the level of coverage has not reached a pre-defined standard and the others have at least reported on the coverage attained by the test suite. It’s nice to show a colourful chart showing a well-tested code base. Stake holders love that.
I recently joined a new project where the build was wired up with Cobertura which was happily reporting that the coverage levels were exceeding 90%. Everyone seemed happy. Nice charts, lots of green bars, warm fuzzy feeling.
In an idle moment, and out of the blue (I’d just built and checked in and was about to start something new), a few questions popped into my head:
Why did we have code coverage checks?
Why would these checks fail the build?
Why would we set the target to 90%?
The easy answers were:
To ensure that our code is tested.
To stop us from ignoring untested code.
It seems like a large enough number.
Hmm, those answers weren’t cutting it for me. I wasn’t feeling warm and fuzzy any more. Slightly upset, I asked myself:
How can I feel warm and fuzzy again?
What piece of testing information would make me happy?
Thankfully, the answer to those questions were also easy to come by:
I would be happy if the desired functions of the system were behaving as they should.
If the functional test suite was complete, correct, and showing 100% passes then we were delivering the right solution. Warm and fuzzy again. Phew.
But what about code coverage? Wasn’t that important? Did the code coverage report perform a useful task for us? Apparently, with regard to the functional tests, the answer was no. Functional tests drive the application; if the application behaves correctly code coverage is irrelevant.
Yes, that’s right, I was saying that I didn’t care if the functional tests exercised all of the code base as long as the system behaved as I expected in the situations I placed it.
So I removed the functional tests from consideration in the code coverage reports. That felt better. What about the integration tests? Same story. If the subsystem behaved the way I expected then the code coverage was irrelevant. I cared about behaviour not how much the code was exercised. The integration tests were also removed from consideration in the code coverage reports.
What about unit tests? Aah, that was it; the penny dropped. Unit tests drive the design (well they do if you test-drive design). One of the project development goals was to test-drive the system. Write a test. Watch it fail. Write some code. Watch the test pass. Refactor, rinse, repeat. We wanted to do this to help ensure that the system was well-constructed and easy to maintain. We care about unit tests exercising particular parts of the codebase. We would only write code to either satisfy a failing test or to refactor to improve design and eliminate duplication.
The build was now changed to only instrument the codebase for the unit test run. No other testing would be part of the code coverage analysis. The build failed. We no longer had 90% coverage. Code had been exercised during integration or functional testing that had not been exercised during unit tests. That code was not test-driven. It had been exercised coincidentally.
Our build was now doing its job of reporting on a failure to adhere to one of the development goals. The penny had dropped. That’s why we had code coverage reports incorporated into the build. Code coverage is important for unit tests. It shows that unit tests are driving the development.
My answers to the original questions were now:
Why did we have code coverage checks?
To ensure we are only writing the code we need.
Why would these checks fail the build?
If we forget to test-drive.
Why would we set the target to 90%?
We won’t, we’ll set the target to 100% and discuss any exceptions to this rule if and when we encounter them.