Backdoor Stubbing

Testing around effects, part 1

Nov 26, 2023

a sign reading "Hippies Use Backdoor: no exceptions" — "Hippies Use Backdoor" by kansasphoto is licensed under CC BY 2.0.

Last week, we looked at some code whose effects made it hard to test. The code in question emailed a report of new user signups to our colleague Eliza. It looked approximately like this:

async function sendSignupReport() {
  const newUsers = await database.run(
    queryFrom("users").where("signup_date", ">", weeksAgo(1)),
  );

  const reportData = breakdownByCountry(newUsers);

  const reportHtml = formatReportAsHtml(reportData);

  await sendEmail({
    from: "noreply@example.com",
    to: "eliza@example.com",
    subject: "Weekly signup report",
    body: reportHtml,
  });
}

(Astute readers will notice that I've changed how we call the database. I realized that if the where method actually ran the query, there would be no way to combine where filters with other operations.)

I think this code does what it's supposed to, but I'd really like to test it to make sure. Unfortunately, testing it is difficult for a few reasons:

database.run accesses the production database, where new users are constantly signing up.
weeksAgo computes a time relative to the current time, which is constantly changing.
sendEmail actually sends an email, which we absolutely do not want to do in our tests, for both social and technical reasons.

Two common threads unite these three problems: effects and dependency. Our three problematic functions have effects that are hard to control and/or observe, and the code we'd like to test depends on those functions by calling them. In order to test it, we need to break one or both of those threads.

In this post, I'll outline the most common strategy for testing effectful code, which I call backdoor stubbing. Backdoor stubbing lets us negate the effects of our dependencies by sneakily replacing them with test doubles. A test double is analogous to a stunt double: it’s any value/function/object that our test uses as a stand-in for a production object. Doubles let us control the inputs to the code under test and sense its outputs.

The last several posts have been “telling posts;” this one is going to be more of a “showing post.” I’m just going to show you how I write code, and save the analysis for later.

First, a note on terminology

Over the many years that people have been using test doubles, they've come up with a whole lexicon of terms for them, most notably documented in Gerard Meszaros's book xUnit Testing Patterns. More recently, though, the trend has been to call every test double a "mock", which I think is unfortunate. We're missing out on the insights captured in the richer vocabulary. Martin Fowler has a great article on what all the different terms for test doubles mean. To quote from his summary:

Dummy objects are passed around but never actually used.
Fake objects actually have working implementations, but usually take some shortcut which makes them not suitable for production (an in memory database is a good example).
Stubs provide canned answers to calls made during the test, usually not responding at all to anything outside what's programmed in for the test.
Spies are stubs that also record some information based on how they were called. One form of this might be an email service that records how many messages it was sent.
Mocks are [...] objects pre-programmed with expectations which form a specification of the calls they are expected to receive.
—Martin Fowler, "Mocks Aren't Stubs"

In this post, we’re going to be using stubs and spies.

Sketching our test

To begin, let's do our best to get a test working without any doubles, so we're on the same page about exactly where the problems are.

Here's my attempt:

test("sendSignupReport", {
  async "sends an email to Eliza"() {
    await sendSignupReport();
    expect(elizasInbox, contains, { // TODO: how do we get elizasInbox?
      from: "noreply@example.com",
      to: "eliza@example.com",
      subject: "Weekly signup report",
      body: "I don't know what should go here!", // TODO
    });
  },
});

(If the syntax looks weird to you, it's because I'm using my own test framework, Taste. I happen to think Taste makes for simple, readable tests, but if you don't like it, let me know in the comments.)

When we run this test, it fails because sendSignupReport calls database.run, which throws an error when it tries to talk to the production database.

There are a couple additional problems, marked with TODO comments in the test:

We don't have access to Eliza's inbox
We don't know what the email body should be, because we don't know how many users actually signed up last week.

First things first, though: let's fix the database issue.

Stubbing the Database

The constant database holds an object with a run method. JavaScript gives us a convenient way to make database more test-friendly: we can just replace the run method with our own implementation.

const newUsers = [{country: "US"}, {country: "US"}];
database.run = () => Promise.resolve(newUsers);

This is our first example of a stub. A stub is a function that simply returns a value hardcoded in the test. After we replace database.run with our stub, any calls to it will return our hardcoded list of users.

This way of creating a stub is a bit crude, though: it leaves the database object in an inoperative state for all future callers. We should really clean up after ourselves by restoring the original implementation of database.run. I'm not going to do that now, though. I'll just add it to our TODO list.

Our test now looks like this:

test("sendSignupReport", {
  async "sends an email to Eliza"() {
    const newUsers = [{country: "US"}, {country: "US"}];
    database.run = () => Promise.resolve(newUsers);
    await sendSignupReport();
    expect(elizasInbox, contains, { // TODO: how do we get elizasInbox?
      from: "noreply@example.com",
      to: "eliza@example.com",
      subject: "Weekly signup report",
      body: "I don't know what should go here!", // TODO
    });
  },
});

...and it fails with a new error! Now sendEmail is throwing an exception.

Stubbing sendEmail is a bit harder than stubbing the database, because JavaScript doesn't make it easy for one module to overwrite another's exported functions. We can work around this, though, by making sendEmail available as a method on an object, like database.run. This requires a small amount of refactoring surgery. It's important that we keep the refactoring extremely small and straightforward, because we currently don't have working tests to keep us safe.

Here's what the emails.ts module looks like before the refactoring:

export function sendEmail(email: Email): Promise<void> {
  // ...
}

I'll change it to this:

export const Emails = {
  send: sendEmail
}

// TODO: update all callers to use Email.send, and stop exporting this
export function sendEmail(email: Email): Promise<void> {
  // ...
}

Now, in our production code, we can change sendEmail to Emails.send:

await Emails.send({
  from: "noreply@example.com",
  to: "eliza@example.com",
  subject: "Weekly signup report",
  body: reportHtml,
});

Our test still fails with the same error. Good! That gives me some confidence that our refactoring was correct.

Now to fix the error. In our test, we'll write another stub:

Emails.send = () => Promise.resolve();

Our test now looks like this:

test("sendSignupReport", {
  async "sends an email to Eliza"() {
    const newUsers = [{country: "US"}, {country: "US"}];
    database.run = () => Promise.resolve(newUsers);
    Emails.send = () => Promise.resolve();
    
    await sendSignupReport();
    
    expect(elizasInbox, contains, { // TODO: how do we get elizasInbox?
      from: "noreply@example.com",
      to: "eliza@example.com",
      subject: "Weekly signup report",
      body: "I don't know what should go here!", // TODO
    });
  },
});

...and it fails with a new error: elizasInbox is undefined. This is progress! Now we just need some way to find out what email would have ended up in Eliza's inbox if our sendEmail stub weren't there.

Upgrading our Stub to a Spy

Martin Fowler defines spies as "stubs that also record some information based on how they were called." We can easily upgrade our Emails.send stub to a spy by having it record the argument passed to it:

const elizasInbox: Email[] = [];
Emails.send = (email) => {
  elizasInbox.push(email)
  return Promise.resolve();
}

Our test is still failing, but this time, it's a real assertion failure! We got an email, but its body property isn't the one our test was expecting. The failure message tells us the body that we actually got was "<li>US - 2</li>"—meaning two users from the US signed up. Let's just copy-paste that into our test:

test("sendSignupReport", {
  async "sends an email to Eliza"() {
    const newUsers = [{country: "US"}, {country: "US"}];
    database.run = () => Promise.resolve(newUsers);
    
    const elizasInbox: Email[] = [];
    Emails.send = (email) => {
      elizasInbox.push(email)
      return Promise.resolve();
    }
    
    await sendSignupReport();
    
    expect(elizasInbox, contains, {
      from: "noreply@example.com",
      to: "eliza@example.com",
      subject: "Weekly signup report",
      body: "<li>US - 2</li>",
    });
  },
});

And just like that, our tests are green! Success!

Cleaning Up

We're not quite out of the woods yet, though. Remember back when I pointed out that our stubbing shenanigans were going to mess up the database and Emails code for future callers? Now's the time to fix that. We need to follow the Scouts' rule and leave our campsite better than we found it.

We've got to restore the original implementations of database.run and Emails.send. In order to do that, though, our test needs to keep track of what the original implementations were. Here's one way of doing it:

test("sendSignupReport", {
  async "sends an email to Eliza"() {
    const oldDatabaseRun = database.run;
    const oldEmailsSend = Emails.send;
    
    try {
      const newUsers = [{country: "US"}, {country: "US"}];
      database.run = () => Promise.resolve(newUsers);
      
      const elizasInbox: Email[] = [];
      Emails.send = (email) => {
        elizasInbox.push(email)
        return Promise.resolve();
      }

      await sendSignupReport();
    
      expect(elizasInbox, contains, {
        from: "noreply@example.com",
        to: "eliza@example.com",
        subject: "Weekly signup report",
        body: "<li>US - 2</li>",
      });
    } finally {
      database.run = oldDatabaseRun;
      Emails.send = oldEmailsSend;
    }
  },
});

We need the try...finally block so that if our test throws an exception (which it will, if an assertion fails), we still clean up after ourselves. As you can see, this makes the test a bit untidy. Multiple concerns are now interleaved; it’s hard for a casual reader to see exactly why the try/finally is needed, or what its relationship to the surrounding code is.

In order to fix that, we're going to write a tiny test double library. If you'll allow me a moment of uncommentated coding:

const replacements: [any, any, any][] = [];

export const Doubles = {
  replace<O extends Object, P extends keyof O>(
    object: O,
    property: P,
    double: O[P],
  ) {
    replacements.push([object, property, object[property]]);
    object[property] = double;
  },

  reset() {
    replacements.forEach(([object, property, oldValue]) => {
      object[property] = oldValue;
    });
    replacements.length = 0; // empty the array
  },
};

The idea is that we can use Doubles.replace in our tests to replace an object property like Emails.send with a test double. Here it is in action:

test("sendSignupReport", {
  async "sends an email to Eliza"() {
    const newUsers = [{country: "US"}, {country: "US"}];
    Doubles.replace(database, "run", () => Promise.resolve(newUsers));
    
    const elizasInbox: Email[] = [];
    Doubles.replace(Emails, "send", (email) => {
      elizasInbox.push(email);
      return Promise.resolve();
    });

    await sendSignupReport();

    expect(elizasInbox, contains, {
      from: "noreply@example.com",
      to: "eliza@example.com",
      subject: "Weekly signup report",
      body: "<li>US - 2</li>",
    });
  },
});

There, much nicer! We still need to call Doubles.reset() to clean up. Instead of doing that within the test itself, we'll tell the test framework to call it after each test. That way, we never have to remember to call reset ourselves. In many JavaScript test frameworks, this would look something like:

afterEach(() => {
  Doubles.reset();
});

In Taste, it's a bit more complicated—and since you probably don't use Taste, you probably don't care—so I'll spare you the details.

Wrap-up

Backdoor stubbing is just the first of my five strategies for testing around effects. I’ll do a full pros/cons analysis in a future episode, but for now I’ll just note that this isn’t my preferred testing approach. It might rank third or fourth. The best is yet to come.

My hope is that you take away a few insights from this post:

In JavaScript, we can use global objects with function properties to create “seams” where we can insert test doubles.
There’s nothing magical about test doubles, and they don’t have to be manufactured by a library. A stub can be a plain old function. A spy can be as simple as a function that appends to an array.
You should always clean up after your test doubles, but this doesn’t have to be difficult. You can do it yourself in less than 20 lines.

Dani Sandoval

Nov 28, 2023

This definitely takes the mystery out of doubles and spies, so thanks for the explanation! I'm nervous, though, that the way these tests are written makes them feel like they're testing "too much of the implementation". Is this a normal feeling? Or am I just too wrapped up in end-to-end testing to see the value in this level of testing?

Expand full comment

1 reply by Ben Christel

1 more comment...

Ben’s Guide to Software Development

Discussion about this post