A Taxonomy of Testing

Unit vs. system, manual vs. automated

May 05, 2025

Hello! You’re reading Ben’s Guide to Software Development, and I’m your host, Ben Christel.

My last few posts have been about the patterns and techniques I use when setting up a new software project:

Continuing the theme, I want to write about how to start testing a project. The summary is, I tend to start with manual system testing of a walking skeleton. As the software begins to develop real functionality, I’ll scaffold each new class and function with unit tests. Once the project has hit a self-sustaining rhythm, I’ll automate some of the system testing, but put almost all my weight on unit tests.

In order to make it fully clear what all of that means, I’m going to have to provide some definitions. That’s what this post is all about!

What is a unit test?

“Unit test” has got to be one of the most maligned, misunderstood, and variously defined terms in our industry. Matt K. Parker has written a bit about the term’s origins in the Smalltalk test framework SUnit:

One of the primary goals of SUnit was that each test could run in isolation from all of the other tests. In other words, a “unit test” is a test that’s isolated from all of the other tests in the test suite. The “unit” is the test itself!

“Isolated” means a test’s setup should not depend on other tests having been run before it. Each test should start from a clean slate and do its own setup. Nor should a test interfere with other tests that might run afterward. Tests should be good citizens and clean up after themselves (or not make a mess to begin with).

I think test isolation is a great idea! But I have a problem with equating “unit test” with “isolated test.” Really, all tests benefit from isolation. The world has moved on since SUnit, and these days, test isolation goes without saying. In a modern software dev shop, the original definition of “unit test” is so all-encompassing as to be useless.

Moreover, while I think originalist definitions like this one can be illuminating, they’re not the end of the story. “Unit test” has a colloquial meaning, which has diverged considerably from the original definition. That’s the definition I want to talk about today.1

Here it is:

A unit test runs in the same OS process as the code being tested, and calls it directly.

One thing I like about this definition is how simple and unambiguous it is. Is your test file importing production functions and classes and doing mad science on them? If so, it’s a unit test.

I know some people are going to disagree with me on this one. In particular, you’ll notice that my definition fails to distinguish between integrated tests and unit tests. To save you time, here are my counterarguments to the other defintions of “unit test” I’ve seen. Please direct your annoyance toward the comments section!

System tests

If your test file doesn’t import production code, it’s not a unit test. So what is it? A system test.

System tests, as the name suggests, interact with a whole (installed or deployed) software system. They simulate actions a user might take: clicking on things and entering data (in the case of a website or GUI app), running commands (in the case of a CLI) or making API calls (in the case of a web service). Other names for system tests include “end-to-end tests” and “integration (or integrated) tests,” though the latter term is ambiguous.2

One neat thing about system tests is that, because they interact with the system through an externally-visible interface, they don’t have to be written in the same language as the production code. There are many system testing frameworks and domain-specific languages available, especially for GUI testing. One I’ve used recently is Cypress; I’m also interested in Playwright. For system testing CLIs, Samir Talwar has created Smoke.

What about system-testing a library? At first glance, it might seem like libraries only need unit tests. After all, their clients are going to be calling them in-process, so in-process unit tests should be perfectly adequate. However, there is also the question of testing the library’s installer, packaging, or configuration. For that, you’ll want some system tests that create a new project (e.g. via npm init in NodeJS), install the library in it, and confirm that the library can be used.

System testing tends to involve more complex infrastructure than unit testing. One of the first problems you’ll run into is that you need an automated way to compile a complete, usable build of your software (though not necessarily a production build) and put it somewhere another program can get at it. Often, this means installing it in some kind of sandboxed local environment specific to the test run. “Sandboxing” means the installation shares nothing with other installations. That way, different runs of the tests don’t interfere with each other (see the discussion on test isolation above), and you know exactly what code you’re testing.

Setting up the infrastructure to do this is an investment that will pay dividends. The ability to create sandboxed installations of your software is valuable even if you’re not doing automated system testing. For example, you’ll often want to have a dev build of your software installed alongside the “real” version that you use day-to-day (you do use your own software, right?) and sandboxed installations make that easy. Setting up a project with automated system tests thus makes manual testing easier too.

Speaking of which…

Manual system testing

So far, I’ve been writing as if all tests are automated. But manual testing exists too, and in many cases, its benefits outweigh its costs.

One advantage of manual testing is that it lacks the main disadvantages of automated system tests, which tend to be flaky and brittle. “Flaky” means they fail intermittently, in ways that appear nondeterministic. “Brittle” means they’re prone to failing when you make some innocuous change to the code, like swapping the order of two buttons. There are ways to write system tests that minimize these problems, but you can’t completely solve them. By contrast, manual testing is more resilient to change and disturbance in the system.

A second downside of automated system tests is that, while they can catch functional problems, they can’t alert you when your UI fails to be understandable, usable, or attractive. The presence or absence of these hard-to-quantify attributes can only be discovered by a human who is actually trying to use the software.

For all these reasons, I prefer to do most of my GUI testing manually. In future posts, I’ll talk about the techniques I use to make this feasible. For now, here’s the gist:

I write UI code in an MVC (model-view-controller) style — separating the logic and state of the UI (the model) from how it’s presented to users (the view).
Because the model knows nothing about the DOM or the UI framework I’m using, it’s easy to unit test. That’s good, because the model contains all the complicated, test-worthy logic.
The view code is extremely simple — “too simple to break.” It’s usually little more than an HTML template with slots for data, and doesn’t need unit tests.
I manually test the UI whenever I make changes to the view.
I sometimes use image-based snapshot testing to catch unintended visual changes. Chromatic is a service that will do this for you.

Manual unit testing?

So far, we’ve covered automated unit tests, automated system tests, and manual system tests. Are there manual unit tests? How would that even work?

Perhaps REPL-driven development or “live programming” could be viewed as a type of manual unit testing. In this style of development, you write code in an interactive prompt (a REPL or read-eval-print loop), calling each new function to check that it’s doing what you expect.

I don’t have much experience with REPL-driven development. It appears to be most popular among Lisp and Clojure programmers. If you know something about it, I’d love to hear your recommendations for where I could learn more!

Wrap-up

We’ve talked about two dimensions along which tests vary:

Manual vs. automated
System vs. unit

A unit test is any test that runs in the same OS process as the code it’s testing, and calls that code directly. A system test interacts with a whole software system the way a user or client would.

When I’m starting a new project, I begin with manual system testing. The software’s tiny and cute, so it starts up fast, and there’s not much to test. As the software grows, I add unit tests for much of the new code. Referring back to my posts on Diamond Design, I will unit test all code in these facets:

domain/
platform/
lib/

I often skip unit testing in the fourth facet, app/. Instead, I keep app/ as simple as possible, preferring to push complexity into the other facets and under the microscope lens of unit tests. When I make changes to app/, I manually test them through the UI.

Sooner or later, all that manual testing starts to feel repetitive. At that point, I automate the drudgery with system tests. However, I’ll continue manual testing throughout the project, looking for visual bugs and user experience issues.

Overall, I have found that this approach to testing is easy to get started, scales well to large projects (> 1 million lines) and, in conjuction with the other techniques I describe in this newsletter, is capable of producing nearly bug-free software. If you’d like to hear more about how I write code, feel free to subscribe!

This definition of “unit test” is designed to be inclusive of all the ways the term is used colloquially. I came up with the definition by applying the principle that if my friends and colleagues refer to something as a unit test, it had better be included. I think I did an okay job at this, but feel free to let me know if you disagree!

An “integration test” aims to verify that the major architectural components of the software (e.g. GUI, backend, and database) work together, and doesn’t try to test functionality in detail. The term “integrated test,” on the other hand, means different things depending on who you talk to. Some people say that a unit test that invokes multiple production classes is an integrated test (yes, it’s almost always classes, not functions. Functional programmers don’t worry about this kind of thing). Others reserve the term for system tests. It’s very confusing, so I avoid the term “integrated test” entirely.

Ben’s Guide to Software Development

Discussion about this post

Ready for more?