Help, refactoring broke my tests

So I saw this surprisingly deep question on stack exchange, which I’ve decided to share months later, because it resonates with some things I’ve been thinking about recently.

How do you keep your unit tests working when refactoring?

Now the question had a bit more detail, but the important point is this:

“I’ve experienced the problems of tests written in such a way that minor refactoring leads to lots of test failures.”

What’s your instinct here? Gut reaction?

The question was quite old – and top answers were very critical of the original poster (OP).

“What you’re trying to do is not really refactoring.”

“If your tests break when you’re refactoring, then you’re not, by definition, refactoring, which is “changing the structure of your program without changing the behaviour of your program”.”

Life pro tip: when someone asks a question that makes you think the question itself is wrong, check your assumptions first.

Refactoring production code can break unit tests.

And we’re not talking about something like testing a method called “Foo”, renaming the method to “Bar”, and then seeing that references in the test code to “Foo” are now broken.

I’m going to use the definition of refactoring from the second quoted answer: refactoring is

“changing the structure of your program without changing the behaviour of your program”

So are the tests part of the program? Either option is a perfectly rational position to take.

If they are, then by definition refactoring can’t break the tests.
So consider that they might not be.

Tests can do naughty things that consumers of your code can’t do.

They can

  • access private state inside your classes
  • inject dependencies that do not meet the behavioural contracts that the system under test requires
  • break encapsulation.

Here’s my own answer

It is important to note that some ways of testing can become fragile when the system under test (SUT) is refactored, if the test is whitebox.

If I’m using a mocking framework that verifies the order of the methods called on the mocks (when the order is irrelevant because the calls are side-effect free); then if my code is cleaner with those method calls in a different order and I refactor, then my test will break. In general, mocks can introduce fragility to tests.

If I am checking the internal state of my SUT by exposing its private or protected members (we could use “friend” in visual basic, or escalate the access level “internal” and use “internalsvisibleto” in c#; in many OO languages, including c# a “test-specific-subclass” could be used) then suddenly the internal state of the class will matter – you may be refactoring the class as a black box, but white box tests will fail. Suppose a single field is reused to mean different things (not good practice!) when the SUT changes state – if we split it into two fields, we may need to rewrite broken tests.

Test-specific-subclasses can also be used to test protected methods – which may mean that a refactor from the point of view of production code is a breaking change from the point of view of test code. Moving a few lines into or out of a protected method may have no production side effects, but break a test.

If I use “test hooks” or any other test-specific or conditional compilation code, it can be hard to ensure that tests don’t break because of fragile dependencies on internal logic.

So to prevent tests from becoming coupled to the intimate internal details of the SUT it may help to:

  • Use stubs rather than mocks, where possible. For more info see Fabio Periera’s blog on tautological tests, and my blog on tautological tests.
  • If using mocks, avoid verifying the order of methods called, unless it is important.
  • Try to avoid verifying internal state of your SUT – use its external API if possible.
  • Try to avoid test-specific logic in production code
  • Try to avoid using test-specific subclasses.

All of the points above are examples of white-box coupling used in tests. So to completely avoid refactoring breaking tests, use black-box testing of the SUT.

Disclaimer: For the purpose of discussing refactoring here, I am using the word a little more broadly to include changing internal implementation without any visible external effects. Some purists may disagree and refer exclusively to Martin Fowler and Kent Beck’s book Refactoring – which describes atomic refactoring operations.

In practice, we tend to take slightly larger non-breaking steps than the atomic operations described there, and in particular changes that leave the production code behaving identically from the outside may not leave tests passing. But I think it is fair to include “substitute algorithm for another algorithm that has identical behaviour” as a practical (rather than formal) refactor, and I think Fowler agrees. Martin Fowler himself says that refactoring may break tests:

When you write a mockist test, you are testing the outbound calls of the SUT to ensure it talks properly to its suppliers. A classic test only cares about the final state – not how that state was derived. Mockist tests are thus more coupled to the implementation of a method. Changing the nature of calls to collaborators usually cause a mockist test to break.


Coupling to the implementation also interferes with refactoring, since implementation changes are much more likely to break tests than with classic testing.

Fowler – Mocks aren’t stubs

Want to read it in context, or vote on my answer? Here it is.

What do you do when your client is wrong?

I blogged last week about how Professionalism means telling the customer when they are wrong. Gonzales commented with a really important question – “What if the client is not just wrong, but stubborn too?”

This has happened to me and to you. You feel that there’s got to be a better end result than compromising your professionalism and delivering software that you know doesn’t help your client as much as it could. You want to take pride in your work.

There is a way to stick to your principles and still keep a positive relationship with your client. It’s all down to how you present your case and how you listen to their case. To make it easy to remember, there are 3 steps based on a technique called “Principled Negotiation” you can follow which really work – and when you pull this trick out people will know it was your professionalism and skilled negotiation that got everyone to a better result.

Suppose your project has already started and there is already financial commitment from both sides. The details are being worked out in an Agile fashion as the project progresses and so part way through your client contact, Jessica, sends you a friendly email that says

“This iteration, could you store passwords in the database as plain text so we can email them out to our users later. Looking forward to seeing it ASAP.”

It’s easy to feel like taking an absolute “no” stance would jeopardise the project – even though you are doing it to prevent your client from causing harm to herself and her users. You can’t risk those passwords going public.

To you, it’s a very clear cut case that you are right. You may even be liable if you agree to do as she asks and later somebody sues.

Positional Bargaining with your clients is a bad plan. You could waste a lot of time and create bad feeling by arguing “we know technology and you’re wrong – it should be like this,” while Jessica says “we know our users and you’re wrong – it should be like this.”

You need a different method. You need Principled Negotiation.

There’s a whole book on the idea, but stick with me, I’ll break it down for you.

1. Get both parties’ interests on the table

What do you want? Most likely your goals are not contradictory. In this example, you want users’ security protected in the case of email interception and in the case of database breach; Jessica wants the process to be easy – or maybe she is thinking ahead to another feature they haven’t mentioned, for example in some cases plain text passwords have been stored in order to allow users to authenticate with a customer services operator by phone.

Once your interests are clearer, you may be able to invent other solutions that satisfy both of you – e.g. there are documented solutions using multiple different types of secrets to enable telephone authentication without risking the users’ primary passwords.

You need to separate the people from the problem. Jessica isn’t stupid. She’s an expert on her business goals. But she might not understand or value your goals. Actively listen to her – this will help encourage her to listen to you too.

If your first response to an issue during development is to consult your contract and figure out what the minimum work you can contractually do is, then you may permanently damage your relationship with her. It should be a last resort, and one that you and she both know was needed because negotiation broke down.

So find the common ground first.

2. Insist on objective criteria for the negotiation

Some of your goals might still actually be in direct opposition. For example, when I interview for a new job, I want the highest possible salary; my employer wants to give me the lowest possible salary. If you can’t solve the issue just by talking through what else each party can bring to the table, then the next strategy is to appeal to an independent source.

This doesn’t necessarily mean you need to bring in a consultant. In the case of you hiring me as a software developer, we could agree that we will start from using market data on average salaries for developers with similar experience. And then we just need to work out the value of your company’s unique benefits and my specialist knowledge, and adjust accordingly.

In the case of the plain text password feature, we could all agree (preferably before the outset of the project) that if we reach a disagreement like this, that we will refer to an unbiased external authority. In this particular case, we could agree that external authority to be Troy Hunt’s blog.

You could also introduce your own custom external authority before the start of the project in the form of a code of ethics.

Your ethical policy might include something like “We believe in protecting users’ private details.”

I’m not a lawyer. But I believe that because there are no promises here, that means that this principle does not introduce additional liability on the company in the case of an accidental breach. You might want to talk to your legal team.

Your ethics are unbiased; they apply equally to all projects, and there is nothing personal – and therefore confrontational – about you appealing to your company’s ethical policy during a dispute.

3. Have a fall back position

It is most often in the interest of both parties to reach agreement. But sometimes it really isn’t. By having thought through the fall back plan, you can make it easier to stick to your principles. If your client attempts to strong-arm you into compliance with threats, you can make your fall back position known.

Having a code of pre-disclosed ethics or a contract with get-out clauses helps with this. For example, working in the veterinary profession, my wife has her clients sign an agreement before their animals undergo surgery – if there is an unforeseen complication and Fluffy’s owner cannot be contacted, then they contractually agree in advance to a fall back plan – that the vet will use their medical expertise to take whatever action is in Fluffy’s own best interests. The principled appeal here is to the idea of what the animal would want, and this is something that other independent vets can agree on.

You may have several fall back plans

  • kill the project (extreme and costly, but always an option)
  • do not develop (and do not charge for) the authenticated features of the project
    – in some projects this may be viable.
  • develop exactly what was in the initial contract (assuming the ethical violation wasn’t something you already agreed to)

Given these plans, you can now more confidently say to Jessica that if you cannot reach an agreement then you may not do the work.

In light of your fall back plan, alternative implementations may be more appealing to her – such as just going with your recommendation, or perhaps agreeing to use a 3rd party provider e.g. OpenID.

It isn’t always that hard

So what about less clear cut cases – maybe you want to encourage a client to use a different UI than then one they propose because you believe it will be more effective?

These negotiations should actually be easier, because less is at stake. Your fall back position may be “do what the client says”.

In terms of Principled Negotiation, the objective criteria you bring to the table might be in the form of referring to past projects or published articles on design. You and your client share the same goal – high user engagement. Perhaps you could invent a solution that works for both of you; for example using analytics and A-B testing to determine which UI users prefer.
Clients may not even be aware of the usage data you could capture – this could be a real value-added for them, helping them know that the application is the best it could possibly be.

I strongly believe that digital agencies regularly sell their own skills short by blindly following their clients’ plans rather than bringing all their expertise to projects and acting as technology experts for hire.

If you’ll indulge me while I make a sweeping generalisation, ALL of the conflicts I’ve seen around features that the client or developers strongly disagree with have stemmed from a lack of communication. All agile methods advocate having the customer as part of the team because constant contact massively improves shared understanding of their business goals and of your technological expertise.

I hear many developers say “my clients are stupid.”

I’ve heard many clients say “my software developers are incompetent.”

Did you ever take the time to listen to each other?

For more about Principled Negotiation, you should buy Getting to Yes by Roger Fisher and William Ury. If you read this book you’ll never think about negotiation the same way again.

Who is the expert?

…in what software the customer wants?
…in how to deliver that to the customer?

Hint: The answer to one of these questions is not “the customer”.

I’ve been given feature requests that include such things as sending passwords by email.

There are 2 major types of response a developer might have to this:

Client: I want the system to email my my password when I forget it.  Yes-Man-Dev: Sure thing boss (compromised security? you asked for it!).  The Professional: That's not great for security - let's talk alternatives.

I would love to say that it’s been a long time since I’ve seen the yes man path taken by a developer, but I guess this is a work in progress. I witnessed this conversation about a project that hadn’t been started yet – but at least one person wanted to implement the inferior solution documented in the agreement; rather than contact the client to get the agreement updated – which would have been in everyone’s best interest and wouldn’t affect the cost.

This sort of thing is far more likely to happen when developers do not have easy and regular contact with stakeholders. If you’re in an organisation that has a long history of waterfall – or chaos – and you’re trying to become Agile, this is one major hurdle to overcome. Sometimes developers cover their asses with a feature list because they’ve been taught that covering asses is what the company does.

But I thought we were here to build great software!

There is a bigger question here. What is your job role?

Are you someone who codes exactly what they are told to code? Are the requirements given to you on a stone tablet from a mountain?

Well, great. If the client never needs technical insight to choose the right solution, if there is nothing you could do better than the author of your specifications, and if the requirements are documented completely, correctly, and never change, then good luck to you.

But that’s not the reality we tend to live in.

When the client is simply starting from the wrong place – “I want to build an app like facebook, with slightly different features, oh and people should pay me to use it.” – Do you advise them that the project is a bust? Or do you charge them for it?

What service would you be proud to say you give?

“I code what my clients tell me to code, after all they’re the experts.”


“I use my deep technical knowledge to help my clients figure out how to achieve their business goals. I can even help them spot opportunities and I help them avoid expensive mistakes.”

Trust me, at the kind of money you are charging your clients, they don’t want you to shoot them in the foot just because they told you to.

You are the expert. Now act like one.

The xUnit team removed my favourite feature… and made me a convert.

So I was helping one of the senior devs here with a code review – let’s call him Pete. Because his name is Pete, so that’ll make things easier.
Anyway Pete asked me for a second opinion on a couple of tests checking for the existence and non-existence of exceptions.

This seemingly ultra-specific topic actually revolved around a few super important general principles – like readability; semantics; and Arrange, Act, Assert.
Its generally accepted that if you ignore these principles your test code quickly turns claggy and unmaintainable.

I was writing up our discussion when I discovered that the xUnit team had removed support for my-favorite-assertion-you-never-heard-of. I’m annoyed of course… but not because they removed it. I’ve been advocating NUnit at my work for some time; we’ve been converting MSTest projects over to it because it’s more expressive.
All of that hard work may need to be redone – because those clever bastards at xUnit had such a damn good reason for doing it they’ve only gone and made me a convert.

I’ll get around to explaining that, but it doesn’t make sense without comparing the tools each framework provides, so bear with me.

Do you know how to test for exceptions in an elegant and expressive way?
Let’s look at some code and see if your implementation comes up.

public class When_validating_an_email
    public void with_an_invalid_email_it_should_throw()
        var sut = new EmailValidator();
            Assert.Fail("it should throw");
        catch (EmailInvalidException e)

The test is easy to understand and the naming while not my style is clear enough… but is everything about the test optimal or do we have choices here?

Of course we do.

Our SUT is a validator that is supposed to throw an exception when it is given an invalid email.

If the SUT throws the wrong type of exception, it will not be caught and will bubble to the test runner which will fail the test.
If the SUT fails to throw an exception, execution will continue to the “Assert.Fail” line and fail there.
And finally, if the SUT behaves as expected, then the “Assert.Fail” will be skipped, and we’ll end up swallowing the exception in the catch block.

Actually that’s pretty complicated when you think about it. That’s three very plausible execution paths, and our happy case involves skipping code (the fail statement).

Let’s jump straight to the next option, the most common way I’ve seen this solved.

    [Test, ExpectedException(typeof(EmailInvalidException))]
    public void with_an_invalid_email_it_should_throw()
        var sut = new EmailValidator();

Well we’ve saved a lot of lines of code.

This implementation is courtesy of MsTest and NUnit’s “ExpectedException” attribute.

Unfortunately, most real world test implementations will be a little longer than this test, and the one thing this test doesn’t do is care _which line_ threw the exception… So for example if the constructor throws (which would lead to a very confusing bug) the test actually passes.

A less important, but still valid concern is that the assertion for the test has floated up and away from the test body to exactly where we don’t expect it to be. I like to understand tests within just a few seconds; this layout gives me eye strain. Actually the xUnit team is quite nicely opinionated on this point, and have explicitly chosen not to implement ExpectedException – (but this was never my favorite-assertion-you-never-heard-of – you’ll have to keep reading for that.)

We can do better.

    public void with_an_invalid_email_it_should_throw()
        var sut = new EmailValidator();
            () => sut.ValidateEmail("") );

We have localised the exception to the method we are testing, we have the exact type of exception constrained, and our assertion is sitting at the end of the test where we expect it to be.

I’m not the only one who prefers it. Programmers frustrated with MsTest’s lack of support for this feature have implemented it themselves.

This method is supported by NUnit and xUnit.

As a final note on “Assert.Throws”, take a look at the first test’s ugly brother.

    public void 
        string result = null;
        var sut = new EmailValidator();
        catch (EmailInvalidException e)
            result = e.Email;
        Assert.AreEqual("", result);

Yuck. {Code blocks} in a test method are a smell that something has gone wrong.

How about this.

    public void 
        var sut = new EmailValidator();
        var e = Assert.Throws<EmailInvalidException>(
            () => sut.ValidateEmail("") );
        Assert.AreEqual("", e.Email);

This is how I currently code this scenario.
xUnit offers an alternative, but we’ll get back to that when we’ve looked at how we can test for no exception.

Every sut that you test throws on invalid input must have a corresponding test for valid input, right?

Let’s see some ways to do that, starting with the one Pete showed me.

    public void with_an_valid_email_it_should_succeed()
        var sut = new EmailValidator();
        catch (EmailInvalidException e)
            Assert.Fail("it should not throw");

This is quite interesting, because the author has done something strictly unnecessary in the noble pursuit of having an expressive test.

Compare with this:

    public void with_an_valid_email_it_should_succeed()
        var sut = new EmailValidator();

These tests pass and fail identically… but the first test has code documenting the idea that we anticipate that an exception could happen and we don’t want it to. This idea of documenting that an exception could happen is a good one, but not done very succinctly in the first example and not done at all in the second example.

In fact, if you read the second test, would you even know that the pass condition was “no exception thrown?” There is no assertion in the test, so it looks like the author forgot to write one – perhaps they were interrupted. One could argue the test isn’t doing anything and has no value, and remove the test – accidentally removing an important test case in some code that uses exceptions.

Minor improvement:

    public void with_an_valid_email_it_should_succeed()
        var sut = new EmailValidator();
        Assert.Pass("No exception thrown.");

Now we can tell that we are testing for an exception. Assert.Pass and Assert.Fail are important to know about – but unless you have exceptions with try-catch (which can cause code to be skipped) or control statements in your test you shouldn’t need them for a finished test. I often use Assert.Fail to stub out a failing test just before I am forced to context switch, to remind me what I had to do next.

Let’s get rid of that Assert.Pass. Again NUnit offers an assertion that takes an Action.

    public void with_an_valid_email_it_should_succeed()
        var sut = new EmailValidator();
            () => sut.ValidateEmail(""));

It’s pretty readable right? This is my favorite-assertion-you-never-heard-of! Like Assert.Throws, it localises the thrown exception to one line – now future readers of the test know the intent and have the idea that this method may throw.

Supposing an exception is raised outside of this statement? Well the NUnit will give different diagnostic feedback about an unhandled exception, rather than about a failed assertion.

This finishes the story for NUnit… So far we’ve seen NUnit consistently provide more expressive options for exception testing than MSTest; and xUnit has seemed great too… but as I already hinted, they removed my favourite assertion.

Assert.DoesNotThrow has been deprecated by the xUnit team! It’s not making the cut for version 2.0!

Surely they can’t be advocating the Assert.Pass version?

I was googling for DoesNotThrow to research opinions and usage when I discovered the offending feature request on github.

Remove Assert.DoesNotThrow #188. BradWilson: So. Incredibly. Not. Needed.

BradWilson: Every single line of code is an implicit

I hope I’ve explained why I disagree with that. Localisation of the exception, readability, and having an assertion in each test are more than reason enough.

Mark Seemann was the first to put the case for the defence. I’m glad someone voiced my views.

Ploeh: I, for one, liked the explicitness of being able to state that this particular line of code shouldn't throw, and that's the test.

If that was it, I would have decided it was a clear cut case – they were just wrong.

But there’s an idealogical issue at stake here. See, the test has a slightly twisted structure. We expect

  • Arrange
  • Act
  • Assert

And what we have instead is

  • Arrange
  • Assert(Act)

Note that Assert.Throws has the same problem. This seems to me to be a minor issue… if the test remains readable. But it is interesting that the xUnit team have taken such a strong opinionated stance.

Adam Ralph made a stronger argument for an alternative.

Personally, I believe neither Assert.Throws<T> nor Assert.DoesNotThrow<T> should exist.  ( can use Record.Exception instead and preserve the structure of Arrange, Act, Assert)

I would never have thought of writing a helper like “Record.Exception” because it violates a different guideline I hold close to heart – “don’t return null” (because you force the caller to check for it). But the means justify the end in this case – which is to preserve the integrity of Arrange Act Assert.

The discussion on the xUnit project board is inspirational – because people passionate about testing are discussing and discovering best practices and building a framework that is designed around those ideas.

In the end your choice will be a matter of taste. But please remember you have more concise and expressive options than try-catching the exceptions yourself.

Exploring alternatives to tautological tests

What is a Tautological Test? Get some background here.

Not every tautological test uses mocks.

“Have you got 10 mins?”


“I’ve been thinking about tautological tests. I’ve got a mapper that I’m TDDing. The test I want to write seems tautological. Any thoughts?”

public class Mapper
    public EntityItem Map(DomainItem domain)
        return new EntityItem() { Id = domain.DomainId, EntityCode = domain.code };
public void When_Map_is_called_Then_all_fields_mapped_successfully()
    var originalItem = new DomainItem() { DomainId = 1, Code = "ABC" };

    var sut = new Mapper();
    var mappedItem = sut.Map(originalItem);

    Assert.That(mappedItem.Id, Is.EqualTo(originalItem.DomainId));
    Assert.That(mappedItem.EntityCode, Is.EqualTo(originalItem.Code));

“It does feel that way. The assertion code is doing the same work as the mapper.  It repeats the mapper’s work. It knows which fields map to which.… but…”

“…but how can you avoid it?!”


We want to write code in small maintainable chunks, but the smaller they get, the harder it is to express the tests in a non-tautological way.

Is this a sign that we should make the units bigger?

So one thought I had about this problem was, why not test the mapper in context? We could test the next class that uses it. It’s not great though. While that is a perfectly viable way of testing the next class up, it falls a little short because we’d like to be able to test edge cases on this mapper in case it ever got more complicated – perhaps with value conversions and so on.

“Of course I’ll still need another test.” Pete fiddled with the code. “Because otherwise I can implement it to return constants and it’ll pass.”

“But we’d need that test anyway.”

“We should focus on the assertion.  It shouldn’t know how the mapping was done.”

We poked the code a bit and thought.

“I saw this nice refactor on – it’s one I already use but hadn’t known the name of before.”

We visited the page.

“So we can replace the assertions like this and compare to an Expected Object.”

public void When_Map_is_called_Then_all_fields_mapped_successfully()
    var originalItem = new DomainItem() {DomainId = 1, Code = "ABC"};

    var sut = new Mapper();
    var mappedItem = sut.Map(originalItem);
    var expectedItem = new EntityItem() { Id = 1, EntityCode = "ABC" };
    AssertItemsAreEqual(expectedItem, mappedItem);

private void AssertItemsAreEqual(EntityItem expectedItem, EntityItem mappedItem)
    Assert.That(mappedItem.Id, Is.EqualTo(expectedItem.Id));
    Assert.That(mappedItem.EntityCode, Is.EqualTo(expectedItem.EntityCode));

Because we are supplying an input and an expected output, and making no assumptions about how we get there, this test is now definitely not tautological. But the difference was subtle.

“I like the fact that we’re comparing an EntityItem to an EntityItem, instead of properties on two different objects.”

“I hadn’t thought of that before.  I’ve been doing this because the new assertion method is reusable – and as long as we name it well, it gives us more information about what we are actually asserting.  It’s really important to me that a test is as easy to understand as possible.”

“Hey now we have just have one assertion in the test – as far as the test is concerned, the equality check is one unit of work.  That’s good!”

Imagine that we had added more logic to the mapper:

public void When_Map_is_called_Then_all_fields_mapped_successfully()
    var originalItem = new DomainItem() {DomainId = 1, Code = "ABC" };

    var sut = new Mapper();
    var mappedItem = sut.Map(originalItem);

    Assert.That(mappedItem.Id, Is.EqualTo(originalItem.DomainId));
    Assert.That(mappedItem.EntityCode, Is.EqualTo(originalItem.Code));
        Is.EqualTo(originalItem.DomainId + "-" + originalItem.Code));

Now it’s clear that this test – which has exactly the same structure as the original – knows too much.  And we’re asking the reader to do a lot of thinking in order to check the test makes sense.

The expected object pattern helps significantly, making us write assertions from a point of view that is not tautological – no calculations involved; no assumptions about which field is which.

public void When_Map_is_called_Then_all_fields_mapped_successfully()
    var originalItem = new DomainItem() {DomainId = 1, Code = "ABC"};

    var sut = new Mapper();
    var mappedItem = sut.Map(originalItem);
    var expectedItem = new EntityItem()
         { Id = 1, EntityCode = "ABC", IdCode = "1-ABC" };
    AssertItemsAreEqual(expectedItem, mappedItem);

Because the mapper code could be more complicated in the future, it’s nice that we’ve ended up with a test pattern which cleanly serves our needs now, and is future proof.

How mocking can be a recipe for disaster

Testing is a Mere Trifle

As a software developer, how would you advise someone to test if a dessert was good enough to serve to your friends and family?

Got your answer? OK. No changing your mind.

In a thanksgiving episode of Friends, Rachael Green decides to make an English trifle. These desserts are traditionally made with custard, sponge cake fingers, and raspberry jelly (jello if you are American).

The pages of her recipe book are stuck together, and she ends up mixing in beef, sauteed with peas and onions. She doesn’t question this, thinking that the British have odd traditional food, and it’ll probably work out.

The result “tastes like feet”.

Hands up if you answered “I would taste it,” or a variant thereof.

Good. You can invite me over anytime.

Did you answer “I would compare the steps taken to the steps in the recipe?”

Sorry – I’d love to come over but I’m really swamped with twitter posts I need to catch up on.

This is one example of a tautological test. “I compare the steps I planned to take to the steps I ended up taking” gives you no confidence that you had a good plan in the first place.

In general a test is Tautological when it repeats some of the implementation of the system under test.  These tests end up adding very little value.

Unfortunately software developers write this sort of test all the time, and don’t even know they are doing it.

Why do we use TDD?

Here are four commonly expressed reasons:

  1. To increase confidence that the code works
  2. To help us make better design decisions
  3. To make refactoring easy
  4. To help prevent regression bugs

Take a look at this code…

public class CarService
    private readonly ISearchQueryFactory _queryFactory;
    private readonly IRepository _repository;

    public CarService(ISearchQueryFactory queryFactory, IRepository repository)
        _queryFactory = queryFactory;
        _repository = repository;

    public List<Car> FindAll()
        var query = _queryFactory.Create<Car>(); 
        return _repository.Search(query);

This example presupposes we have a Repository that takes a Query<T> object to which we may add any search filters and allows us to Search the database for matches. Since we want all cars as results, in this case we use the default unfiltered query.

…and the code that tests it.

public void FindAllShouldSearchForAllCars()
    var mockFactory = new Mock<ISearchQueryFactory>();
    var mockRepository = new Mock<IRepository>();

    var testee = new CarService(mockFactory.Object,


    mockFactory.Verify(x => x.Create<Car>(), Times.Once);
    mockRepository.Verify(x => 

This is a modified example from an excellent introduction to the tautological test antipattern by Fabio Pereira, which I am reusing to look at alternatives.

When we discussed this example in my office, at least initially, the room divided into two camps – was this test good or not?

Let’s rip it to shreds.

This test gives the coder no confidence that the implementation is correct (1) – because the recipe in the code is repeated in the test.

var query = _queryFactory.Create<Car>();
return _repository.Search(query);
mockFactory.Verify(x => x.Create<Car>());
mockRepository.Verify(x => 

From a TDD point of view it does (emptily) perform the function of being a failing test that forces certain code to be written – but unlike black box TDD, this test gives no scope for the problem to be solved in any other way. This test cannot help a good design grow out of the requirements. (2)

A worthwhile question then is – if you knew you would write this code this way anyway, why did you bother with a test?

How about refactoring? Well if we try and implement the code another way, the test will fail, even if all the production code works. But if we try and change the test first, we still hit a red light. This is terrible – we should be able to refactor production code while maintaining a green light all the time! For each change we must edit the code and the associated test(s) – incidentally this also breaks the single responsibility principle. (3) We can’t have any confidence that our refactoring is side-effect-free if we require the tests to be rewritten – so we have no protection against regression bugs. (4)

And yeah, that eliminates all four of our reasons why we work with TDD in the first place.

So we could write tests like this and claim to be performing TDD. But if we do, we get none of the benefits of TDD whatsoever. Rather the tests have negative value – reducing the flexibility of the code; slowing down maintenance.

There must be a way of testing this feature that isn’t tautological, right?


Let’s start with an integration test.

public void FindAllShouldReturnAllCars()
    var expectedCars = 
        new List<Car> { new Car() { Id = 1 } };

    var factory = new SearchQueryFactory();
    var repository = new Repository();

    var testee = new CarService(factory, repository);
    var results = testee.FindAll();
    CollectionAssert.AreEqual(expectedCars, results);

Well this is fine. This tests that the code works, and there’s no ugly knowledge floating around in the test about the internal implementation.

In terms of being black box, this is ideal, and will have the best possible chance of remaining robust even if the implementation changes.

Supposing that the repository talks to a database, then the big issue with this test is that it does not isolate the system under test from its external dependencies. We can’t know that the test database will always be available; it may have maintenance costs of its own; and external resources may be slow to access, which reduces the effectiveness of our test suite as a fast-feedback system.

By stubbing just the external dependencies, we can resolve these speed and maintenance issues (but we trade away the knowledge that the system works correctly even across the architectural boundary).

public void FindAllShouldReturnAllCars()
    var expectedCars = 
        new List<Car> { new Car() { Id = 1 } };

    var factory = new SearchQueryFactory();
    var stubRepo = new Mock<IRepository>();
        x => x.Search(It.IsAny<SearchQuery<Car>>()))
    var testee = new CarService(factory, stubRepo.Object);
    var results = testee.FindAll();

    CollectionAssert.AreEqual(expectedCars, results);

I call this test a “classicist test” because Martin Fowler refers to Classic TDD in contrast to Mockist TDD in his vital article Mocks aren’t Stubs.

The classical TDD style is to use real objects if possible and a double if it’s awkward to use the real thing” – Fowler

When I demoed this example to my colleagues, Steven asked “What about isolation? This test isn’t a proper unit test.”

I probably would have agreed before reading Fowler’s article – but he says that unit testing has not always meant the same thing to all people. As a Classic TDDer, his method is to assume all other systems are working correctly and write the test to test the next unit in the stack based on that assumption.

“[when multiple tests break…] Classicists don’t express this as a source of problems. Usually the culprit is relatively easy to spot by looking at which tests fail and the developers can tell that other failures are derived from the root fault.”  – Fowler

The advantages this test has are speed and black box requirements testing – meaning that it is reasonably robust against implementation change – however a little white boxing has leaked through now that we are providing the return values from the stub.

We can go further.

public void FindAllShouldSearchForAllCars()
    var expectedCars = 
        new List<Car> { new Car() { Id = 1 } };
    var stubFactory = new Mock<ISearchQueryFactory>();
    stubFactory.Setup(x => x.Create<Car>())
		.Returns(new SearchQuery<Car>());
    var stubRepo = new Mock<IRepository>();
        x => x.Search(It.IsAny<SearchQuery<Car>>()))
    var testee = new CarService(mockFactory.Object,

    var results = testee.FindAll();

    CollectionAssert.AreEqual(expectedCars, results);

Now all the dependencies have been stubbed, this is a true isolation test. But there’s a trade off – each time a stub is introduced, a little more white box knowledge leaks into the test.

It’s not as bad as mocking though – each time a mock is introduced, considerably more white box knowledge leaks through.

Compare this test to the mockist test – the mockist test has absolutely no advantages. And in my experience, far more often than not, a test with mocks can be re-written to use stubs. (I’ll look at some times when this isn’t true in a future blog.)

Give me one kind of test I can always write that always works.

No can do.

If you need a rule to take away, then habitual mocking is an unnecessary practice that prevents us from achieving the benefits of TDD.  Sometimes there’s no other choice though.

But there is no one best way. Understand the trade-offs of integration, classic, isolation and mockist testing, and choose whatever works best in each circumstance.

Blindly following a dogma is just a way of making sure you never innovate.

Tautological tests don’t need to include mocks – just assuming what your implementation is can be enough to make blunders. Read more!

On spam

Some kind visitor left a comment on one of my posts.

It began…

{I have|I’ve} been {surfing|browsing} online more
than {three|3|2|4} hours today, yet I never found any interesting article like
yours. {It’s|It is} pretty worth enough for me.

{In my opinion|Personally|In my view}, if all {webmasters|site owners|website owners|web owners} and bloggers made good content as you did, the {internet|net|web} will be {much more|a lot more} useful than ever before.|
I {couldn’t|could not} {resist|refrain from} commenting.

{Very well|Perfectly|Well|Exceptionally well} written!|

…and continued like this for another 25000 characters. Overall it included tens of different comments, and each comment contained multiple wording variations.

Later, another variation explained:

I {{leave|drop|{write|create}} a {comment|leave a response}|drop a
{comment|leave a response}|{comment|leave a response}}
{each time|when|whenever} I {appreciate|like|especially enjoy} a {post|article} on a {site|{blog|website}|site|website} or {I have|if I have} something to {add|contribute|valuable to contribute}
{to the discussion|to the conversation}.

This is a great attitude, and I’m glad the commenter decided to contribute something so interesting to the conversation.

This is obviously the master template for an automated comment spam system.  That much is clear.

It’s also clear that somehow something went wrong this time, and instead of the randomised comment I was supposed to receive, I got the master template.

What can we figure out from the error?

Well the code could look something like this…

public string BuildCommentBodyText()
    string bodyText = GetMasterTemplate();
        bodyText = MakeChoice(bodyText)
    return bodyText;

This way, if everything works fine, I get a randomised comment.

But if the first attempt to evaluate choices fails, I get the entire template text because the entire template is loaded into the output variable before work begins.

There are slightly more complex versions of this code that could exhibit a similar error.  If I parsed the entire template using the interpreter pattern (read more about this pattern in the GoF Design Patterns book), so that some node were simple (plain text with no choices), some nodes were choices of alternate words, and some nodes were compound nodes made up of an ordered sequence of all three types of nodes… then it would be possible to achieve the same correct result. So long as the parsing tree formed correctly.  But if the tree formation failed and the whole template was loaded into the first node as a simple node then the error would look similar too.

The comment was truncated.  I don’t know whether WordPress hit a maximum character limit, or whether the hacker hit a limit of their own.  But the upshot is that what was submitted to me did not have a matching closing curly brace to close the first opening brace.  This could actually be the reason that their code failed to select a random entry.  When parsing an incomplete template, the ContainsChoice() method would doubtless have returned false.

Why send these automated comments?

I’m not a security expert.  And I’ve never defrauded anyone online.  But I don’t think it’s a wild guess to say that unless there was some profit in it, then these comments wouldn’t be sent.

Well some of the value could be in whether or not the spam comment gets approved for the website.  The spider that submitted the comment could check back periodically.  If a comment has been approved, that could indicate a security weakness on the site, such as no comment filtering, or a gullible webmaster.  Vulnerable sites could then be targeted for human intervention.

It’s like when a hypnotist doing a stage show asks everyone in the audience to do something.  Out of an audience of a thousand, if one or two audience members swear blind that they can’t unclasp their hands when everyone else has, then these two audience members are the best target to use for the rest of the show.

One particularly nice variation said:

{Could|Would} you {list|make a list} {all|every one|the complete urls} of
{your|all your} {social|communal|community|public|shared}
{pages|sites} like your {twitter feed, Facebook page or linkedin profile|linkedin profile, Facebook
page or twitter feed|Facebook page, twitter feed, or
linkedin profile}?

If this comment was approved, published to the site, and followed up with a “sure, these are all my details” reply, I am sure a con would be on its way.  Perhaps something like the story James Bach tells of a poor girl from Ghana who needs help paying for her inherited gold.

Why was the comment randomised?

This one is easy to answer.  Wordpress has helpfully pushed a large number of comments into my spam folder.  Each of these comments has been submitted with identical content multiple times.  Clearly using text that doesn’t vary makes it very easy to spot and filter spam.

Other kinds of comment attack

The thing is, when you allow someone else to put content on your website you are inviting an attack.

The most well known attacks in the .NET web development community are SQL injection attacks.   This results from a vulnerability where the program directly uses user input as part of a sql string to execute against the database.  Perhaps something like this:

string sql = 
        "INSERT INTO Comments (CommentText, UserName, Date) " +
        "VALUES ('" + inputComment + "', '" + inputUserName + "', GETDATE());";
SqlCommand cmd = new SqlCommand(sql, connection);

Unfortunately for the owners of websites with this sort of vulnerability, their entire database is compromised.

Suppose an attacker submitted the following comment:

UserName: UBinHacked
CommentText: ' + SELECT TOP 1 UserName + ': ' + Password from Users, '', null); --

Then the executed sql would be:

INSERT INTO Comments (CommentText, UserName, Date)
VALUES ('' + SELECT TOP 1 UserName + ': ' + Password from Users, '', null); -- ', 'UBinHacked', GETDATE());

Everything after the “; –” is a comment and will be ignored. This means the attacker had full control over the second part of the sql statement.  In this case they just got your admin username and password into the body of a comment.

(To make the attack easier to understand I assumed your password isn’t salted.  Which would be a massive security vulnerability itself.)

It doesn’t stop there though – once the hacker is able to control even part of the sql statement, then by using the comment marker they can perform any number of attacks, from reading privileged data, altering data, or just deleting entire tables from the database like “little bobby tables”.

Salting your passwords isn’t enough to prevent the attacks – if the user knows the password to just one account, they can set the admin passwords to the same password and salt.

Once you’ve seen and understood SQL attacks, other kinds of attack are easier to understand too.

What if an attacker posted a comment that included extra javascript in the page?  That javascript could run on every visitor’s computer that comes to your site.  The script could redirect the visitor to a malicious copy of your website.  Or a number of other crazy variations.  This is known as an XSS attack.

If you want to understand the multitude of attacks possible, I highly recommend reading 24 Deadly sins of Software Security by Michael Howard, Devid LeBlanc and John Viega.

I’d love to ask the commenter exactly what kind of attack they wanted to launch.

Unfortunately I can’t easily contact them.  You see, the email address they submitted in the comment was actually a link to a malicious website.

The two types of Test Coupling

Imagine with me.

You have a complicated system, with dark impenetrable algorithms – classes that span many pages of scrolling.

Because of the difficulty of understanding the program flow here, to help debugging some wise soul has added a debug log to a problematic class which logs a few key actions – calls to external resources and so on.

Later, when a bug is found – an external service is not being called under specific circumstances – before fixing the bug the developer adds a failing test that represents the bug. They write a few test assertions against the debug log, elevating read access to the log to “public” for this class.

And we get to the present day, where you are sitting in front of the code, tasked with adding a new feature to it. It is still as impenetrable as it was before. Being an excellent fearless programmer, you proclaim: “To refactor is to understand,” and you begin taking apart the code by extracting classes and methods, looking for reusable concepts. You observe the class is covered by tests so the refactoring will be safe to perform. The first few steps make the code a little messier – but you can feel a step coming up that simplifies everything.

You apply the step that transforms and simplifies everything.

It is a cathartic moment – the culmination of an hour of work.

Then tests start to fail. It makes no sense – you thought you understood the purpose of the code. Sure that you’ve missed something easy, you quickly jump into the test file to figure out what could possibly be wrong…

And the test is reading from a DebugLog that sits in the class you just consolidated. The test is failing because it is looking for log entries in sequence of particular actions – and now the actions are represented completely differently.

You thought you had understood the code but now you are full of doubt – do the actions HAVE to take place in the order specified? Was it ok to change their representations? Is it important that the log contains that exact text – is something else reading it?

You think that probably the order doesn’t matter… but with a sinking heart you realise you can’t write a new test that does express the intent of the code because you just can’t be certain – and you can’t intelligently modify this one. Suddenly you feel stuck. And a little sick.

You hit revert.

It takes only a few minutes to add the feature you were going to add. You tack it on the end of a series of if statements. And you add an entry to the debug log.

“At least I can use the log to test my feature gets applied,” you say.

Words are power. But you’re in a right mess. You’re going to need some words to describe the mess you’re in before you can get yourself out of it.


We usually talk about code being coupled or a coupling existing between two systems, when one system has dependencies on the other. The word helps us think about the way one system changing will cause or require the other system to change.

I’d like to use this word to describe the conditions that cause a test to fail or not build. Actually this is the same meaning, but in the context of tests. But coupling in a test can be quite precise: a test with the line

"Assert.IsTrue(enterpriseClass.ToString().Contains("Tony Todd");"

is coupled to the presence of the substring “Tony Todd” somewhere in the enterpriseClass’s ToString method GIVEN the specific set up of the class in prior test steps. The rest of the string output can vary as much as you like.

Characteristic Coupling

So this is when a test has coupling to a system, and the coupling represents behaviour requirements. Or in more practical terms – a test that breaks because of the coupling implies that there is a system defect.

Incidental Coupling

A sort-of opposite to characteristic coupling. This is when a test is coupled to a non-characteristic behaviour or contract of the system – so if the contract/behaviour changes in a way that breaks the test, this doesn’t necessarily mean there is a defect in the system.

A good example of Incidental coupling is when a test relies on the artefacts of an implementation, such as a log – like in the example, or when a test relies on internal state of a class. You might find this kind of coupling on methods that should be private, but which are public “because they need to be tested”.

If your test is coupled specifically to part of an implementation, and that implementation could have been achieved in many different correctly behaving ways, then the test has incidental Coupling.

What went wrong?

We’ve got the words to describe the problem now.

When the first developer added a log to the problematic class, this was internal to the class.
When the next developer exposed the internal log and tested based on the content of that log, this introduced a test with significant incidental coupling.
When the last developer tried to refactor the code, CHANGES THAT DIDN’T INTRODUCE A DEFECT BROKE THE TEST, because of that incidental coupling.
The final step – adding more tests that assert based on the log might help confirm that everything works as the developer expects _at the moment they are written_ but as soon as that code is committed, the tests become worse than useless.

They become a burden.

It would be easy to think “any tests are better than none” – but its not true. Tests without any characteristic coupling send frequent false alarms, may actually pass when code is defective, give an illusion of security, and waste developers’ time trying to understand them.

Developers may seem rational but they’ll sometimes make bad decisions. A passing test which will fail in a complicated way if you change something incidental is like a tiny devil sitting on your shoulder saying “naww don’t change that, it looks _hard_.” Even if the BEST way to express the code involves that change, you’ll be tempted just to work around the tests, rather than fix them.

What I mean is – whether the coupling is incidental or characteristic, code coupled to tests is resistant to change.

If it is characteristic coupling, then this prevents regression bugs.

If it is incidental coupling… well… you’re lucky if it just makes the code harder to maintain.


So couple your tests wisely. You’ll have to live with the legacy.


Some decisions are hard to change.

Ever built a desktop tower PC from components? Or ever upgraded a tower PC?

What’s the easiest component to change?

Trick question.

You were thinking “RAM” right? Man you can unclip one of those little silicon buddies and snap a beefier one in place in less than a minute. Once the case is open.

No, not RAM.

How about your USB memory stick. That’s pretty easy to change. If you don’t use one, how about your mouse? Or headphones?

Okay we’re thinking outside the box here. What about software? Ehhh not as easy as headphones. Oh how about the user!  Well unless your password is “123456”…

I’m going with headphones.

Right so now you’re ready for the real question. What’s the hardest component to change?

You’ll be looking for trick answers now, this is good.

The desk? Heh, nice outside the box thinking – yeh that’s hard to change. Pretty much have to unhook and move everything to get that thing shifted.

Motherboard? Getting warmer. But come on, that’s obvious.

No. What I’m thinking about is
The case.

The case is an interesting little beast. It does one job and does it well. It keeps things that should be inside the box inside it; things that are outside outside. But while doing that it also provides external ports for headphones, mice, keyboards and monitors; it provides internal fixings for the motherboard, dvd drive, power supply, sound card and so on.

But the really nice thing about the case, is that if I go and sit at your computer with my headphones, I don’t need to give a damn about what is inside your case. I just plug my headphones in, put on your music and I start changing your desktop background.

And the other really nice thing about the case is that if you want to start swapping out components – your hard drive could stand to be a bit bigger – you can just swap that out. And no-one needs to care what is outside your case.

Your case makes things easy to change.

Except the case.

Its hard to change the case, because its busy making everything else easy to change.

I’m not just talking about swapping your case for another case.  Could you actually swap your case for something that wasn’t a tower case but solved the same general problem?

Well sure you could but you might need to redesign the bits that fitted inside it.  Laptop cases have many of the same features.  Server racks solve a similar problem.  Once you’ve decided on a tower PC case, you can swap any of the bits inside and out but changing it into a laptop is a tall order.

The case is a real world example of a design pattern at work.

Software design patterns are common solutions to common problems.

Hey I need to be able to get an instance of an object… but I don’t really care if that object is a derived type or not – I shouldn’t be the one making that decision… and I don’t want to think about how that object is constructed.  Maybe you want a factory pattern?

I need an action performed for me.  I know what kind of inputs I’ll have every time, but I really don’t know what the content or algorithm of the action will be. I just need to know I can invoke it and it’ll behave consistently as far as I’m concerned whatever other stuff is really going on inside.  Sounds like a strategy pattern.

The thing about design patterns in software is that they make /some/ things easy to change, both in terms of implementation behind the pattern, and the consumers in front of the pattern. But once you’ve decided on a design pattern, the pattern itself becomes hard to change.

So every use of a pattern is a trade off. You say “I’m going to sacrifice being able to change this later – because I want to be able to change that instead.”

In the GoF Design Patterns book, each pattern is described alongside what it makes easy to change. The trade off is the pattern itself.

So what am I saying – are patterns bad?

No. Clearly.

They’re not good either. Design patterns are a tool. Effective programmers may be more effective because they use design patterns effectively, in context, and with knowledge of what is being traded.

Ineffective programmers may be ineffective because they don’t know about design patterns and so don’t apply them.

Seriously ineffective programmers – well they might know some patterns and apply them blindly and inappropriately – like a confused plasterer in a new house, with every floor, ceiling and wall tiled painstakingly carefully in mellow blue and green colours. “I worried it might get wet,” they say, “so I copied this thing I saw in a swimming pool once.”

Specification by Example – show, don’t tell.

CLIENT: And then you just take the base policy payment, perform a look up against the user’s postcode and apply a crime risk multiplier. And that gets you the initial monthly premium for the policy.

DEVELOPER: (Writing as he talks) So we’ll need a feature definition that says,
GIVEN I live in Canvey Island,
WHEN I apply for insurance,
THEN the monthly policy premium should be the base premium multiplied by the appropriate crime modifier for my postcode.

CLIENT: Exactly!

TESTER: Gosh that sounds complicated.

DEVELOPER: Oh come on it’s just one sum! There’s nothing else to it.

TESTER: I don’t suppose you have a real life example?

CLIENT: Sure. (Opening a complicated looking spreadsheet.) So, say I live in Canvey Island. I’ll need a specific postcode. We can use SS89YJ. So I put that in this box on the spreadsheet and I get the premium here. And here is the crime multiplier for the postcode, see?

TESTER: That doesn’t seem to add up.

CLIENT: We talked about the floods insurance multiplier already – didn’t we?

TESTER: No problem, lets figure it out in stages.  I’ll take the numbers we used in the spreadsheet calculation.
GIVEN I live in SS89YJ
AND the postcode crime multiplier for SS89YJ is 0.10
AND the flooding multiplier for SS89YJ is 0.15
AND the base premium is £10.00
WHEN I apply for insurance
THEN the monthly policy premium should be £12.50

DEVELOPER: Gosh that sounds complicated.  Do you suppose we should get some more examples, in case we missed something else?