So I saw this surprisingly deep question on stack exchange, which I’ve decided to share months later, because it resonates with some things I’ve been thinking about recently.
How do you keep your unit tests working when refactoring?
Now the question had a bit more detail, but the important point is this:
“I’ve experienced the problems of tests written in such a way that minor refactoring leads to lots of test failures.”
What’s your instinct here? Gut reaction?
The question was quite old – and top answers were very critical of the original poster (OP).
“What you’re trying to do is not really refactoring.”
“If your tests break when you’re refactoring, then you’re not, by definition, refactoring, which is “changing the structure of your program without changing the behaviour of your program”.”
Life pro tip: when someone asks a question that makes you think the question itself is wrong, check your assumptions first.
Refactoring production code can break unit tests.
And we’re not talking about something like testing a method called “Foo”, renaming the method to “Bar”, and then seeing that references in the test code to “Foo” are now broken.
I’m going to use the definition of refactoring from the second quoted answer: refactoring is
“changing the structure of your program without changing the behaviour of your program”
So are the tests part of the program? Either option is a perfectly rational position to take.
If they are, then by definition refactoring can’t break the tests.
So consider that they might not be.
Tests can do naughty things that consumers of your code can’t do.
- access private state inside your classes
- inject dependencies that do not meet the behavioural contracts that the system under test requires
- break encapsulation.
Here’s my own answer
It is important to note that some ways of testing can become fragile when the system under test (SUT) is refactored, if the test is whitebox.
If I’m using a mocking framework that verifies the order of the methods called on the mocks (when the order is irrelevant because the calls are side-effect free); then if my code is cleaner with those method calls in a different order and I refactor, then my test will break. In general, mocks can introduce fragility to tests.
If I am checking the internal state of my SUT by exposing its private or protected members (we could use “friend” in visual basic, or escalate the access level “internal” and use “internalsvisibleto” in c#; in many OO languages, including c# a “test-specific-subclass” could be used) then suddenly the internal state of the class will matter – you may be refactoring the class as a black box, but white box tests will fail. Suppose a single field is reused to mean different things (not good practice!) when the SUT changes state – if we split it into two fields, we may need to rewrite broken tests.
Test-specific-subclasses can also be used to test protected methods – which may mean that a refactor from the point of view of production code is a breaking change from the point of view of test code. Moving a few lines into or out of a protected method may have no production side effects, but break a test.
If I use “test hooks” or any other test-specific or conditional compilation code, it can be hard to ensure that tests don’t break because of fragile dependencies on internal logic.
So to prevent tests from becoming coupled to the intimate internal details of the SUT it may help to:
- Use stubs rather than mocks, where possible. For more info see Fabio Periera’s blog on tautological tests, and my blog on tautological tests.
- If using mocks, avoid verifying the order of methods called, unless it is important.
- Try to avoid verifying internal state of your SUT – use its external API if possible.
- Try to avoid test-specific logic in production code
- Try to avoid using test-specific subclasses.
All of the points above are examples of white-box coupling used in tests. So to completely avoid refactoring breaking tests, use black-box testing of the SUT.
Disclaimer: For the purpose of discussing refactoring here, I am using the word a little more broadly to include changing internal implementation without any visible external effects. Some purists may disagree and refer exclusively to Martin Fowler and Kent Beck’s book Refactoring – which describes atomic refactoring operations.
In practice, we tend to take slightly larger non-breaking steps than the atomic operations described there, and in particular changes that leave the production code behaving identically from the outside may not leave tests passing. But I think it is fair to include “substitute algorithm for another algorithm that has identical behaviour” as a practical (rather than formal) refactor, and I think Fowler agrees. Martin Fowler himself says that refactoring may break tests:
When you write a mockist test, you are testing the outbound calls of the SUT to ensure it talks properly to its suppliers. A classic test only cares about the final state – not how that state was derived. Mockist tests are thus more coupled to the implementation of a method. Changing the nature of calls to collaborators usually cause a mockist test to break.
Coupling to the implementation also interferes with refactoring, since implementation changes are much more likely to break tests than with classic testing.
Fowler – Mocks aren’t stubs
Want to read it in context, or vote on my answer? Here it is.