The many flaws of test coverage
Recently in a Twitter chat with a couple friends of mine, the subject of test coverage re-appeared. I rolled my eyes. Ready to start ranting, I remembered already covering the many flaws of test coverage in Python Testing Cookbook. So I thought, perhaps an excerpt would be better.
From chapter 9, Python Testing Cookbook..
Coverage Isn’t Everything
You’ve figured out how to run coverage reports. But don’t assume that more coverage is automatically better. Sacrificing test quality in the name of coverage is a recipe for failure.
How to do it…
Coverage reports provide good feedback. They tell us what is getting exercised and what is not. But just because a line of code is exercised doesn’t mean it is doing everything it is meant to do.
Are you ever tempted to brag about coverage percentage scores in the break room? Taking pride in good coverage isn’t unwarranted, but when it leads to comparing different projects using these statistics, we are wandering into risky territory.
How it works…
Coverage reports are meant to be read in the context of the code they were run against. The reports show us what was covered and what was not, but this isn’t where things stop. Instead, it’s where they begin. We need to look at what was covered, and analyze how well the tests exercised the system.
It’s obvious that 0% coverage of a module indicates we have work to do. But what does it mean when we have 70% coverage? Do we need to code tests that go after the other 30%? Sure we do! But there are two different schools of thought on how to approach this. One is right and one is wrong:
- The first approach is to write the new tests specifically targeting the uncovered parts while trying to avoid overlapping the original 70%. Redundantly, testing code already covered in another test is an inefficient use of resources.
- The second approach is to write the new tests so they target scenarios the code is expected to handle, but which we haven’t tackled yet. What was not covered should give us a hint about what scenarios haven’t been tested yet.
The right approach is the second one. Okay, I admit I wrote that in a leading fashion. But the point is that it’s very easy to look at what wasn’t hit, and write a test that shoots to close the gap as fast as possible.
Python gives us incredible power to monkey patch, inject alternate methods, and do other tricks to exercise the uncovered code. But doesn’t this sound a little suspicious? Here are some of the risks we are setting ourselves up for:
- The new tests may be more brittle when they aren’t based on sound scenarios.
- A major change to our algorithms may require us to totally rewrite these tests.
- Ever written mock-based tests? It’s possible to mock the target system out of existence and end up just testing the mocks.
- Even though some (or even most) of our tests may have good quality, the low quality ones will cast our entire test suite as low quality.
The coverage tool may not let us “get away” with some of these tactics if we do things that interfere with the line counting mechanisms. But whether or not the coverage tool counts the code should not be the gauge by which we determine the quality of tests.
Instead, we need to look at our tests and see if they are trying to exercise real use cases we should be handling. When we are merely looking for ways to get more coverage percentage, we stop thinking about how our code is meant to operate, and that is not good.
Are we not supposed to increase coverage?
We are supposed to increase coverage by improving our tests, covering more scenarios, and by removing code no longer supported. These things all lead us towards overall better quality.
Increasing coverage for the sake of coverage doesn’t lend itself to improving the quality of our system.
But I want to brag about the coverage of my system!
I think it’s alright to celebrate good coverage. Sharing a coverage report with your manager is alright. But don’t let it consume you.
If you start to post weekly coverage reports, double check your motives. Same goes if your manager requests postings as well.
If you and yourself comparing the coverage of your system against another system, then watch out! Unless you are familiar with the code of both systems and really know more than the bottom line of the reports, you are probably wandering into risky territory. You may be headed into faulty competition that could drive your team to write brittle tests.
Agree? Disagree? Feel free to put in your own opinions on the pros and cons of test coverage reports in the comments section.