Category Archives: testing

The many flaws of test coverage

Recently in a Twitter chat with a couple friends of mine, the subject of test coverage re-appeared. I rolled my eyes. Ready to start ranting, I remembered already covering the many flaws of test coverage in Python Testing Cookbook. So I thought, perhaps an excerpt would be better.

From chapter 9, Python Testing Cookbook..

****

Coverage Isn’t Everything

You’ve figured out how to run coverage reports. But don’t assume that more coverage is automatically better. Sacrificing test quality in the name of coverage is a recipe for failure.

How to do it…

Coverage reports provide good feedback. They tell us what is getting exercised and what is not. But just because a line of code is exercised doesn’t mean it is doing everything it is meant to do.

Are you ever tempted to brag about coverage percentage scores in the break room? Taking pride in good coverage isn’t unwarranted, but when it leads to comparing different projects using these statistics, we are wandering into risky territory.

How it works…

Coverage reports are meant to be read in the context of the code they were run against. The reports show us what was covered and what was not, but this isn’t where things stop. Instead, it’s where they begin. We need to look at what was covered, and analyze how well the tests exercised the system.

It’s obvious that 0% coverage of a module indicates we have work to do. But what does it mean when we have 70% coverage? Do we need to code tests that go after the other 30%? Sure we do! But there are two different schools of thought on how to approach this. One is right and one is wrong:

  • The first approach is to write the new tests specifically targeting the uncovered parts while trying to avoid overlapping the original 70%. Redundantly, testing code already covered in another test is an inefficient use of resources.
  • The second approach is to write the new tests so they target scenarios the code is expected to handle, but which we haven’t tackled yet. What was not covered should give us a hint about what scenarios haven’t been tested yet.

The right approach is the second one. Okay, I admit I wrote that in a leading fashion. But the point is that it’s very easy to look at what wasn’t hit, and write a test that shoots to close the gap as fast as possible.

There’s more…

Python gives us incredible power to monkey patch, inject alternate methods, and do other tricks to exercise the uncovered code. But doesn’t this sound a little suspicious? Here are some of the risks we are setting ourselves up for:

  • The new tests may be more brittle when they aren’t based on sound scenarios.
  • A major change to our algorithms may require us to totally rewrite these tests.
  • Ever written mock-based tests? It’s possible to mock the target system out of existence and end up just testing the mocks.
  • Even though some (or even most) of our tests may have good quality, the low quality ones will cast our entire test suite as low quality.

The coverage tool may not let us “get away” with some of these tactics if we do things that interfere with the line counting mechanisms. But whether or not the coverage tool counts the code should not be the gauge by which we determine the quality of tests.

Instead, we need to look at our tests and see if they are trying to exercise real use cases we should be handling. When we are merely looking for ways to get more coverage percentage, we stop thinking about how our code is meant to operate, and that is not good.

Are we not supposed to increase coverage?

We are supposed to increase coverage by improving our tests, covering more scenarios, and by removing code no longer supported. These things all lead us towards overall better quality.

Increasing coverage for the sake of coverage doesn’t lend itself to improving the quality of our system.

But I want to brag about the coverage of my system!

I think it’s alright to celebrate good coverage. Sharing a coverage report with your manager is alright. But don’t let it consume you.

If you start to post weekly coverage reports, double check your motives. Same goes if your manager requests postings as well.

If you and yourself comparing the coverage of your system against another system, then watch out! Unless you are familiar with the code of both systems and really know more than the bottom line of the reports, you are probably wandering into risky territory. You may be headed into faulty competition that could drive your team to write brittle tests.

****

Agree? Disagree? Feel free to put in your own opinions on the pros and cons of test coverage reports in the comments section.

When testing really matters

This past Tuesday, we had James Ward, principal developer advocate for Heroku, give a presentation on the Play Framework, scala/java, and Heroku at the Nashville JUG. Suffice it to say, it was a really awesome presentation. This was far from a sales pitch, and more like a lets-get-our-hands on this framework and build a web app as fast as possible.

Sadly, I had to leave early, but I was hooked on poking my nose into Play. I have been fiddling with a small toy app using Scalatra. It is really nice where you focus on building the routes and then stir in some templates, with a little scala in there to make things nice and succinct. But I had been running into issues repeatedly with the testing. I decided from the get-go, that I would make it a test-driven project, and had built up automated testing quite nicely. I built all the REST points and had everything working with a nice stubbed out data source with future plans to replace it with a NoSQL database.
Scalatra has some nice test structures. You can either use specs or ScalaTest to check everything out. But something that nagged me for some time was how my tests would periodically fail every 3rd or 4th time. Sometimes more often than that. But since they would occasionally pass, I knew that everything worked right. It was probably just some race condition that had to do with the jetty container it used to run the tests.
When I tried to swap out my in-memory maps with MongoDB, I couldn’t get the tests to pass ever. Comment out MongoDB code; tests pass. Bring back ’em on line; fail again. It drove me nuts. I tried lots of things. This was when I saw the Play demo. Seeing that it was heavily route-based, I thought it probably wouldn’t be hard to transfer my existing test suite to a Play app, modify any API calls, and use this set of tests to rebuild the REST points in Play. Well, I was able to do it in just a few hours. And guess what? Now there are no failures…EVER!
Why did Play succeed where Scalatra failed me? I think it’s because Play has different levels of testing. They build in helpers that let you test from any angle, meaning you don’t have to test the runnable container. Instead, they offer a lot of useful angles to isolate and test things out:
  • test individual units
  • test templates
  • test controllers
  • test routes (see http://www.playframework.org/documentation/2.0.2/ScalaFunctionalTest)
  • test the whole thing with an HTTP server
  • test from a browser with Selenium
All these things give you the option to easily grab your existing test suite no matter how it was structured, carry it over to Play, and rebuild you app back up, one test at a time, until everything passes. Then, with your test harness in place, go back and start refactoring as you see fit!
Right now, the web pages look pretty ugly. They are the basics with no styling at all. There aren’t even links to connect the pages together. That is because I am building this app so that each REST point does what it is supposed to. Then I can start putting a nice look-and-feel on it. I’m already excited because it appears that Play has some helpers to plugin Twitter’s bootstrap HTML/CSS/JavaScript library, something I was planning on when I would get around to the UI.
Given all this, Play has really made it easy to put together a REST-ful web app using Scala, my primary goal. I hope you have similar experiences.

What is the best testing tool?

Someone posted to me a question through meetupcom, “Greg, what is the best testing tool?” I didn’t have room to reply. I posted my response inside the Nashville JUG Google Group we host, but I thought today, it would be better to capture it here.

Asking “what is the best tool” with no other context sounds like you are looking at the situation from the wrong point of view. Let me explain.

A few years ago, I inherited a java app with hardly any automated tests, had been worked on by 6 engineers before me for a couple of years, had been demoed in front of program management, and was non operational. It did stuff, but not much and was loaded with errors. I adopted an approach where I wouldn’t work on any bug/issue until I could write an automated test to reproduce the issue. This was painful; more than you can realize. The software was tightly coupled, lots of static singletons, i.e. global variables, and hard to isolate. Basically, my first test involved using JUnit to empty some database tables, programmatically ingesting a spreadsheet, and then inspecting the database for results. This test took 5 minutes to run. It probably ran through a big chunk of code.

The problem was the system, and no magic test tool was the answer. Changes in the beginning were slow, but as I gained momentum, I eventually reached over 60% test coverage. This may sound low, but within 3 months, I made a flawless demo before program management. They were impressed that it worked, let alone with no problems. The tool was starting to get used by the intended audience. A year later, reported bugs were fixed in one day, and never regressed. I reached a point where I worked for a whole day, and when the half the unit tests failed, I threw away all changes and went home for the day, depressed. I started fresh the next day and actually fixed the problem. Two days to fix it! With no automated test suite, that would probably have been released, and incurred a gob of new bugs to deal with.

I met weekly with the users and captured new feature requests, problems, and generally made this tool work really well. The customers were very happy. I was also empowered to rewrite whole sections that were slapped together hastily in the past, because I had the security of my automated test suite. I threw away code that wasn’t used and didn’t work. My boss showed me a chart where the total lines of code became less than when I first inherited, yet it did more than those other 6 people could squeeze out of it. That was a happy day!

What I’m saying is that I could have used JUnit, TestNG, ScalaTest (works on java too), or any other suite of tools like acceptance testing frameworks, load testing, etc. But what test tool was used wasn’t important. Adopting a strategy of making the whole thing subject to testing, staying test focused no matter how painful it is, paid off. At some times, the test suite took 1.5 hours to run. I spent three days speeding up the most expensive parts of the system, and cutting out certain tests, getting the same number of tests to run in 30 minutes. I came up with a comprehensive test suite and a smaller, smoke test that ran much faster. I also created a spreadsheet to track number of tests and total test time, along with a graph. As the test time grew, I would periodically halt development and polish up certain parts that made it too hard to run tests. I would run the test suite at least once a day to make sure things worked right.

This whole development period was some of the best coding I had done in a long time. Cranking out top quality code with a warm fuzzy green bar made me grin ear-to-ear. When I left that company, I cried a bit because I wouldn’t be working on that tool anymore.

In the 9th chapter of my book, Python Testing Cookbook, the first recipe captures a lot of what I wrote up above, with some more detail. To quote the recipe “Something is better than nothing,”

“Just don’t let anyone tell you that you are wasting your time building a long-running test case. An automated test suite that takes an hour to run and is exercised at least once a day probably instills more confidence than clicking through the screens manually. Something is better than nothing.” —Python Testing Cookbook, page 326

When I wrote chapter 9, I wanted to move beyond simple coded recipes, but instead general lessons I had learned in the realm of testing. These principles work, whether you are writing Python, Java, Scala, or anything else.