Unit-Testing: Carrots and Sticks

(This is the written version of my presentation by the same name, at Agile and Beyond 2019 in Detroit. There are also some short video clips from that talk.)

I. Introduction

If you’re already doing TDD, and/or have 90%+ unit-test coverage, and/or pair-program all the time, you probably don’t need me. Feel free to leave!
If you don’t think unit-testing has value… well, you’re crazy, and I have two words for you: Boeing 737. You’re welcome to stay, but I’m not here to convince you.

For the rest of us… I’ll assume you know the value of unit-testing, and you’d like to see your group or your company write more (and better) tests, achieve higher coverage percentages… and sleep better at night.

(Note: I’m just talking about unit-tests here. There are several distinct kinds of automated testing that are useful: unit-tests are considered the foundation or base layer of testing. See Cohn’s Triangle for more information.)

Perhaps you’re working on a large ‘legacy’ project, and you’re struggling to overcome ancient history, from when few tests were written. Or perhaps your management or other developers don’t value unit tests highly — or want to, but feel pressured to “get work done” instead.

So — how do you change that? I’ve been working in those environments for 10+ years, with a lot of success with both carrots and sticks. But there are two key principles:

Find common ground. Look for the easy wins. Not everyone is going to agree with you — but almost everyone will agree with some improvements. Get those first. Build on success.
Think three-dimensionally. Change happens in three different domains:
1. Organizational. Make the value (of unit-tests) visible to company management, and a part of company culture.
2. Social. Developers are people, first. Make them feel good (carrots) or bad (sticks) where it’s useful.
3. Technical. Provide the tools! Make tests easier to write. Make it easier to see where improvements are happening — or not. Make it visible!

The next sections address each of the three domains individually. Feel free to skip around, they don’t need to be taken in order. But each one is important!

(Having said that, I’ll lead with a teaser: my personal pride and joy is The Martinizer — a little tool I wrote to quickly and easily identify just the new and changed code that is not covered by tests. But more on that in section IV.)

II. Organizational Change

If your management is simply hostile to writing tests, you’ve got a bigger challenge (or perhaps a new job search!) ahead of you. There are tons of books, studies, and essays on why unit-testing makes good financial sense.

But it’s much more likely that your management knows that unit-tests have value, they just don’t feel it in their bones. They have to balance prioritizing that work among many competing needs. And of course, they’ll never see the problems that good unit-testing prevents, any more than you saw the landmines you didn’t walk over.

So your work here is to make that value visible, and to make incremental changes that won’t cost much. Managers love incremental improvements that they get “for free” (or nearly so).

Make it visible. Start by creating automatic graphs of the unit-test coverage for each project. Make it easy to get to (not buried deep inside your build tools). Managers love historical data!
Tie it to velocity. Document the anecdotal (and probably painful) stories where new code broke old code. Point out that greater test coverage means this happens less often, means higher velocity.
Make tech debt real. Find a way to calculate your technical ‘debt’, and turn it into a number. We built a “tech debt calculator” that simply counted up each instance in the code of several dozen simple anti-patterns, assigned each one a weight, and computed the weighted sum every day and added it to a graph. (For example, an unused import is worth 1 point, a method longer than 50 lines is worth 15, and a class longer than 1000 lines is worth 50.)

Yes, in some ways it’s absurd, but it makes it real. Ironically, our CFO discovered this graph, and started adding it into his company debt records! At first I tried to explain that it wasn’t “real” (dollars) debt… but actually, it is! Heck, estimate the developer cost to fix each anti-pattern, and make it produce dollar amounts!

Now this kind of tech debt isn’t about unit-testing… but a lack of tests is definitely one form of tech debt. The two horses pull the same coach.
Get permission. Yes, forgiveness is easier… but it doesn’t persist, it doesn’t change the culture. Talk to your direct manager about blocking out stories specifically to improve the existing unit-test coverage or quality. Or get agreement on (say) 2 hours a week just to do that.
Offer training. Managers love in-house training. It’s so much cheaper than the alternative! Offer lunch-time talks, after-work ‘kata’ exercises, short pairing sessions with other developers who would like to improve.
Own it. Find, or be, the champion. The person who can talk knowledgeably about the benefits, and the how-to, of better unit-tests. Be polite, be reasonable, be the person in search of common ground. That’s how you move the goal-posts forward. Not by thundering like an Old Testament prophet.
Find a Patron. Companies are still pretty feudal in structure. So find a ‘management champion’, someone who understands where you want to go, and who will help ‘grease the skids’. Think strategically with them. Always offer multiple choices: e.g. from #5, external consultant training, vs in-house training, vs no training.

In many companies, management knows that unit-tests have value (to them!), but they may have trouble articulating why and how. Here are a few things to remind them of:

Minimizes bugs. This is the most obvious: all other things being equal, untested code is more likely to have bugs.
Increased frequency of releases. The more (and faster) automated test series (of all kinds) you have, the sooner you can do the next release.
Greater modularity. “OK, but what does that buy me?” It means code is easier and faster to change. Or call it “lower coupling”.
Increased velocity. Code that has a “safety net” is easier, faster, and less scary to change.
Specs/documentation. It’s been famously said that “the tests are the documentation”. If so, they’re a clumsy form of documentation… but better than none at all! Again, “what does that buy me?” Less developer time finding out what the damned thing is supposed to do in the first place!
Increased developer skills. Writing good tests makes a developer think outside of the box, reduces “tunnel vision”.
Developer longevity (or lack thereof). When developers move on to another company, it’s a lot easier and faster for new developers to pick up, understand, and work with code that has good test coverage — because of the tests themselves, and because code written with tests in mind tends to be more compact and easier to work with.

III. Social Change

Developers by their very nature tend to be independent cusses, who think their way of doing something is the best. (Who, me?) It is critically important to be seen as a solution (a “carrot”) rather than a problem or encumbrance (a “stick”).

At the same time, with the help of a supportive manager, some “sticks” are useful. A report from an automated tool (see “The Martinizer”, below) that points out new or changed code that is not covered by tests, is going to be much easier to accept than criticism from other developers that “you’re not writing enough tests”. Ditto ditto graphs and other ways of making test coverage and technical debt visible.

Most important of all, however, is to come from a place of personal power. I don’t mean managerial power, or fighting to win every argument, or bullying. Other developers are not nails, and you are not a hammer. Rather, if you are already seen as a knowledgeable person who helps other people, has interesting (even if sometimes controversial) ideas, and finds or creates tools that are clearly helpful to other developers… then you have power. Or “political capital”. Or “karma”. Whatever you call it.

With that in mind, here are tactics I have used with some success:

Be a ‘thought leader’: run a “Brown bag”, group coding exercise (for fun!), kata, randori, even teaching an in-house class.
Volunteer to review others’ unit-tests. (Especially if your company does not do pairing.)
Managerial rewards. I keep track of who added what % coverage to our largest project, in a simple automated spreadsheet. Every quarter I nominate the top 2 developers for an existing in-company “rewards” program ($50 gift card). (I have to exclude myself, of course.) Management support has been awesome.
Managerial oversight. Discuss test coverage as part of any overall “performance analysis” rating by managers. Just the thought that managers might consider it will raise the awareness of developers. Mostly useful as a carrot, only in very rare circumstances as a stick.
Honest conversations about priorities, individually or in Agile-style retrospectives. To what extent do developers feel that they have to prioritize “getting work done” versus writing tests? What kinds of pressures do they feel? Management frequently sends mixed messages about this, so don’t be surprised if people feel that and react. Sometimes you can even see it in their body language.

Ask those questions, and then listen. Don’t judge. Encourage developers to have exactly that conversation with their managers — it’s the managers’ job to resolve those kinds of conflicts, internal or otherwise. Or, if you have a highly-empowered team, it’s the team’s job to build in the cost of unit-tests to their estimates.
Use humor. See, for example, the Junit / Green Lantern oath, or The Way of Testivus (“Less unit-testing dogma, more unit-testing karma”.) Subversive ideas often slip in easier through humor than through argument.
New and emerging developers. Sometimes the easiest people to reach are those who know they’re still learning. Summer interns. New hires. QA staff are who getting their feet wet in coding. Turning these people loose on writing new unit-tests, or fixing up old ones, is a win-win. They learn about testing, they learn about the existing code-base… but they also learn about what makes good code. And the feedback cycle, both thru the testing framework and a code reviewer, can be very, very short.

For example, we have several QA folk who already write automated UI tests, it was just a short step (for those with ambition and curiosity) to start writing unit-tests for production code… and then from there to simple refactorings of the code, once the tests were 100%.
Pair when you can. Many companies (and developers) are averse to pairing. (Heck, I don’t like it myself as a full-time exercise. I hate broccoli, too.) But whenever someone asks me a question about unit-testing, I try to use that as an excuse to pair with them for, say, half an hour. That’s a great opportunity for lots of “hey, I didn’t know you could do that” conversations.

IV. Technical Change

Wearing my architect and developer hat, this is the fun part. Over the years I’ve experimented with using and building a bunch of tools. This section sums up the experiments and the results: YMMV. (Most of my work has been with Java projects, but the principles apply to most languages.)

“Vulture”. Using git commits, find the classes that are changed most often, then sort by lowest unit-test coverage. (Image: a vulture circling overhead, looking for the dead-or-dying-bodies.) If untested change is the cause of bugs, this is logically the best place to look.

Result: so-so. It should make sense, but it got us nowhere. It tended to find classes with lots of innocuous little changes, or else stinking piles of old spaghetti that no-one wanted to even try to write tests for.
FindBadIgnores. A simple shell script: it scans for @Ignore’s (ignored tests) that have no comment explaining why a test was ignored. I simply informed everyone that we really should document why any test was ignored, and everyone (grudgingly agreed)… mostly because it’s a tiny, cheap, incremental fix.

Result: success. Plus it helps remind everyone that we really shouldn’t have ignored tests stay around for very long.
FindImbalances. If you’re using a mocking framework (e.g. EasyMock), it’s important to have a verify() for every replay(). Developers often forget the importance of verify(). FindImbalances is a shell script that scans for and ensures that these calls are balanced, i.e. identical in number.
JavaBeanTester. Who wants to write a test to exercise a simple bean? Should we even bother? Well, yes — because code keeps changing! I stole and extended some code that uses reflection to examine all the fields in a bean, and automatically tests them. In a single line of code. The benefit may be (arguably) low, but the cost to write this test is even lower! (Look for it on github.com/wcroth55.)

Result: small success.
Ditch ‘private’. This one is controversial — but shouldn’t be. There’s a well-intentioned but totally misguided belief that code should only be tested through the ‘public’ interface, i.e. that private methods should not be tested directly. Rubbish! ‘Private’ is an anachronism, from the early days of Java, before people were even thinking about testing. C++ at least had the “friend” modifier, so (for example) test classes could be a “friend” of the class they were testing. Insisting we can only test through the public interface is like saying we can only test a jet aircraft engine by flying the entire plane!

Code should be tested, period. If you can test thru the public interface, that’s great. Design things that way if possible. But tested code always trumps lofty principles.

The real point here is that I want to make it as easy as possible for developers to write tests. So I tell them to remove ‘private’… which in Java makes it “package protected”, the next most restrictive permission. Even better, I tell them to replace ‘private’ with /*TestScope*/, which makes it clear that “this really ought to be private, but I need it to be testable”.

Result: very successful (with occasional arguments!)
Libraries. Unit-tests are supposed to be first-class code: meaning, they should get ‘cleaned’ and de-duplicated just like ‘production’ code. But it’s astonishing how much bad (and duplicated) unit-test code gets written. So I built a common library for all of our projects, and every time I needed something in a test, I added a new method to the library.

Simple example: PqCollections.list(a, b, c) makes an ArrayList of whatever a, b, and c are. Instead of 4 lines to create the list, and add 3 objects to it. The point, of course, is that shorter (and more readable!) tests are easier to write, easier to understand, and easier to maintain.

Result: middling success. This is as much a social change as it is a technical change.
Automatic unit-test writers. Several commercial products claim to automatically write unit-tests for you. I tried a couple out. I found them utterly useless. No, I don’t remember their names.
Nagamatic. Every time a commit raises the unit-test coverage, this fires out an email informing the committer of the change. Every time a commit lowers the coverage, a different email goes out.

Result: ho-hum. Seemed like a good idea, but it quickly became noise.
PIT Mutation testing. This is an extremely cool idea, but with questionable application. The idea is that you run a set of tests against a method, multiple times, and the ‘mutator’ changes the byte codes of the method in a bunch of standardized ways. At least one of your tests must fail! Otherwise your tests are not “good enough”.

Result: near zero utility in most applications. But if lives depended on my code (c.f. Boeing 737), I’d demand it.
Micro-services. Not exactly a tool, more like an architectural sweet spot. Breaking off pieces of a large, monolithic, project into micro-services is the perfect time to be more rigorous about unit-testing. The new services are more like “green fields” projects, and are usually simply easier to write tests for. Watch for this opportunity, and use it ruthlessly!
The Martinizer. This is my favorite. Named in honor of “Uncle Bob” Martin, it’s just a little bit of glue that connects a couple of other tools: in this case, git and Jacoco. It compares two commits, finds the code that was changed or added, and asks if that code (just that code) is fully covered by unit-tests. If the answer is no, it kicks out a report about just those lines (to me, to the committer, to whoever is relevant).

In the scenarios that I wrote this talk for, this has been huge. Basically it says “I don’t care how bad things are now, backsliding is not allowed!”

Result: considerable success. The graph below shows test coverage for a 10-year-old (~1M LOC) project. After slacking off for a year, I re-instituted enforcing the results of The Martinizer in early Nov 2018. The slope increased substantially. (The uptick in late Nov is a data-recording error, it was actually flat in early Nov and then started accelerating upwards.)

While the tool itself is directly useful (shows you where coverage is needed), I think it also has an important social effect: it sends the message that someone is watching. That in itself could be either positive or negative (think “Big Brother”), but when accompanied with a helpful attitude (“Is it feasible to add tests for this? Can I assist?”), it can raise awareness, and start useful conversations.

2 thoughts on “Unit-Testing: Carrots and Sticks”

Pingback: Agile & Beyond 2019
Charles Roth says:

June 3, 2019 at 2:48 pm

There are also some (hilarious, to my mind) pictures of me presenting this talk, at http://zen2.caucuscare.com/index.php?album=9-McRae-Roth-Family/2019-05

I. Introduction

II. Organizational Change

III. Social Change

IV. Technical Change

2 thoughts on “Unit-Testing: Carrots and Sticks”

Leave a Reply Cancel reply