mastodon.online is one of the many independent Mastodon servers you can use to participate in the fediverse.
A newer server operated by the Mastodon gGmbH non-profit

Server stats:

11K
active users

James Shore

Testing Without Mocks: A 🧵.

So a few days ago I released this massive update to my article, "Testing Without Mocks: A Pattern Language." It's 40 pages long if you print it. (Which you absolutely should. I have a fantastic print stylesheet.) I promised a thread explaining what it's all about.

This is the thread. If you're not interested in TDD or programmer tests, you might want to mute me for a bit.

Here's the article I'm talking about: jamesshore.com/v2/projects/tes

www.jamesshore.comJames Shore: Testing Without Mocks: A Pattern Language

2/ First, why bother? Why write 40 pages about testing, with or without mocks?

Because testing is a big deal. People who don't have automated tests waste a huge amount of time manually checking their code, and they have a ton of bugs, too.

The problem is, people who DO have automated tests ALSO waste a huge amount of time. Most test suites are flaky and SLOOOOW. That's because the easy, obvious way to write tests is to make end-to-end tests that are automated versions of manual tests.

3/ Folks in the know use mocks and spies (I'll say "mocks" for short) to write isolated unit tests. Now their tests are fast! And reliable! And that's great!

Except that now their tests have lots of detail about the interactions in the code. Structural refactorings become really hard. Sometimes, you look at a test, and realize: all it's testing... is itself.

Not to mention that the popular way to use mocks is to use a mocking framework and... wow. Have you seen what those tests look like?

4/ So we don't want end-to-end tests, we don't want mocks. What do we do?

The people really REALLY in the know say "bad tests are a sign of bad design." They're right! They come up with things like Hexagonal Architecture and (my favorite) Gary Bernhardt's Functional Core, Imperative Shell. It separates logic from infrastructure so logic can be tested cleanly.

Totally fixes the problem.

For logic.

Anything with infrastructure dependencies… well… um… hey look, a squirrel! (runs for hills)

5/ Not to mention that (checks notes) approximately none of us are working in codebases with good separation of logic and infrastructure, and (checks notes again) approximately none of us have permission to throw away our code and start over with a completely new architecture.

(And even if we did have permission, throwing away code and starting over is a Famously Poor Business Decision with Far-Reaching Consequences.)

6/ So we don't want end-to-end tests, we don't want mocks, we can't start over from scratch... are we screwed? That's it, the end, life sucks?

No.

That's why I wrote 40 pages. Because I've figured out another way. A way that doesn't use end-to-end tests, doesn't use mocks, doesn't ignore infrastructure, doesn't require a rewrite. It's something you can start doing today, and it gives you the speed, reliability, and maintainability of unit tests with the power of end-to-end tests.

7/ I call it (for now, anyway, jury's out, send me your article naming ideas) "Testing With Nullables."

It's a set of patterns for combining narrow, sociable, state-based tests with a novel infrastructure technique called "Nullables."

At first glance, Nullables look like test doubles, but they're actually production code with an "off" switch.

8/ This is as good a point as any to remind everyone that nothing is perfect. End-to-end tests have tradeoffs, mocks have tradeoffs, FCIS has tradeoffs... and Nullables have tradeoffs. All engineering is tradeoffs.

The trick is to find the combination of good + bad that is best for your situation.

9/ Nullables have a pretty substantial tradeoff. Whether it's a big deal or not is up to you. Having worked with these ideas for many years now, I think the tradeoffs are worth it. But you have to make that decision for yourself.

Here's the tradeoff: Nullables are production code with an off switch.

Production code.

Even though the off switch may not be used in production.

11/ The fundamental idea is that we're going to test everything—everything!—with narrow, sociable, state-based tests.

Narrow tests are like unit tests: they focus on a particular class, method, or concept.

Sociable tests are tests that don't isolate dependencies. The tests run everything in dependencies, although they don't test them.

And state-based tests look at return values and state changes, not interactions.

(There's a ton of code examples in the article, btw, if you want them.)

12/ This does raise some questions about how to manage dependencies. Another core idea is "Parameterless Instantiation." Everything can be instantiated with a constructor, or factory method, that takes NO arguments.

Instead, classes do the unthinkable: they instantiate their own dependencies. GASP!

Encapsulation, baby.

(You can still take the dependencies as an optional parameter.)

13/ People ask: "but if we don't use dependency injection frameworks..."

I interrupt: "your code is simpler and easier to understand?" I'm kind of a dick.

They continue, glaring: "...doesn't that mean our code is tightly coupled?"

And the answer is no, of course not. Your code was already tightly coupled! An interface with one production implementation is not "decoupled." It's just wordy. Verbose. Excessively file-system'd.

(The other answer is, sure, use your DI framework too. If you must.)

14/ Anyway, that's the fundamentals. Narrow, sociable, state-based tests that instantiate their own dependencies.

Next up: A-Frame Architecture! This is optional, but people really like it. It's basically a formalized version of Functional Core, Imperative Shell. I'm gonna skip on ahead, but feel free to check out the article for details. Here's the direct link to the Architecture section: jamesshore.com/v2/projects/tes

www.jamesshore.comJames Shore: Testing With Nullables: A Pattern Language

15/ Speaking of architecture, the big flaw with FCIS, as far as I've seen, is that it basically ignores infrastructure, and things that depend on infrastructure.

"I test it manually," Gary Bernhardt says in his very much worth watching video: destroyallsoftware.com/screenc

That's a choice. I'm going to show you how to make a different one.

(Not trying to dunk on FCIS here. I like it. A-Frame Architecture has a lot in common with FCIS, but has more to say about infrastructure.)

www.destroyallsoftware.com Functional Core, Imperative Shell

16/ So right, Infrastructure!

Code these days has a LOT of infrastructure. And sometimes very little logic. I see a lot of code that is really nothing more than a web page controller than turns around and hands off to a bunch of back-end services, and maybe has a bit of logic to gllue it all together. Very hard to test with the "just separate your logic out" philosophy. And so it often doesn't get tested at all. We can do better.

17/ There are two basic kinds of infrastructure code:

1) Code that interfaces directly with the outside world. Your HTTP clients, database wrappers, etc. I call this "low-level infrastructure".

2) Code that *depends* on low-level infrastructure. Your Auth0 and Stripe clients, your controllers and application logic. I call this "high-level infrastructure" and "Application/UI code".

18/ Low-level infrastructure should be wrapped up in a dedicated class. I call these things "Infrastructure Wrappers," 'cause I'm boring and like obvious names, but they're also called "Gateways" and "Adapters."

Because it talks to the outside world, this code needs to be tested for real, against actual outside world stuff. Otherwise, how do you know it works? For that, you can use Narrow Integration Tests. They're like unit tests, except they talk to a test server. Hopefully a dedicated one.

19/ High-level infrastructure should also be wrapped up in an Infrastructure Wrapper, but it can just delegate to the low-level code. So it doesn't need to be tested against a real service—you can just check that it sends the correct JSON or whatever, and that it parses the return JSON correctly.

And parses garbage correctly. And error values. And failed connections. And timeouts.

*fratboy impression* Woo! Microservices rock!

20/ At this point, people ask,

"But what if the service changes its API? Don't you need to test against a real service to know your code still works?"

To which, I respond: "What, you think the service is going to wait for you to *run your tests* before changing its API?"

(Yeah, still kind of a dick.)

You need to have runtime telemetry and write your code to fail safe (and not just fall over) when it receives unexpected values. I call this "Paranoic Telemetry."

21/ Sure, when you first write the high-level wrapper, you'll make sure you understand the API so you can test it properly, maybe do some manual test runs to confirm what the docs say.

But then you gotta have Paranoic Telemetry. They ARE out to get you.

True story: I was at a conference once and somebody—I think it was Recurly, but maybe it was Auth0—changed their API in a way that utterly borked my login process.

My code had telemetry and failsafes, though, and handled it fine. Paranoia FTW.

@jamesshore using a DI framework doesn't reduce coupling in any meaningful way, as you say.

It *hides* coupling and encourages excessive coupling.

@jamesshore
I love this concept for FP as well as OOP. I implement a very similar concept when testing purely functional code. I take advantage of ad-hoc polymorphism and default function parameters in #Kotlin wakingrufus.neocities.org/adho

wakingrufus.neocities.orgUsing Ad-hoc Polymorphism to Test Functional Kotlin

@jamesshore

note: for me that isn't into the formal definitions;

I thought interaction-based tests would test return values (interactions make me think of event sourcing and reducers) and that state-based tests would check internal state of mocked dependencies.

I still think you should use the formal language as long as it can present a mental model that anchors the meaning of the definitions.

@marcusradell Think of it in terms of what the code does. Interaction-based test check how the code *interacts* with dependencies, and state-based tests check the *state* of the code under test.

@jamesshore when seen from the perspective of the unit/code under test, it instinctively clicks for me. (Or the curse of knowledge gives me that sensation, but I think not.)

@jamesshore So here I sit at the airport, sipping fresh (overpriced) orange juice, whilst waiting for some arrivals, and I start to read this.

Thanks for spending the time writing this. It is fn awesome content.

@jamesshore I'm definitely going to follow up with your 40 pages but in the meantime, what do you mean by, 'sociable' here?

@rob_fulwell Tests that run code in dependencies. There’s a section with more detail in the article.

@rob_fulwell @jamesshore

I’m other words than James‘:
Tests that don’t micro test smallest units (class/method/function) in isolation (all collaborators mocked) but e.g. a use whole use case.
The line to integration tests is blurry, but the terms are overloaded anyways.

If it’s only business logic (no persistence, etc.) then many people (like me) still call this unit tests.
But the discussion about that quickly gets religious.

@jamesshore Don't mute this - it's actual non-surface-level non-basic shit and even just reading it will expand your horizons. Bonus points if you integrate some of it into your workflow.

@jamesshore I’d really like to see you test drive some persistence code, ideally sql but anything non trivial really. Interested to see where the seam is between real infra and nullable infra in that case. Most of the videos you’ve done in the past have just been in memory, which is why I’m asking.

@djones I have a lot of material on calling web services, which isn’t in memory, but nothing public on databases. @jitterted and I are likely to get into it in our stream together if it lasts long enough.

@jamesshore so since this is neither London or Chicago style, what should we call it?

@jamesshore @mlevison There are no “styles”, there’s only red/green/refactor. There’s only exploring the solution space guided examples from the problem space.

@keithb_b @jamesshore I was just making a slightly silly comment about the styles that exist over the years?

Detroit, London and now Portland?

Curiously when I do code I lean closer to James style

@jamesshore

That was a very interesting article to read, which I really should read a few more times, but… a question that keeps popping up in the back of my mind is “what value does a Nullable deliver to the consumer of the code?”. When I was a *very* young dev the `NullRenderer` confused me 😅

So I mostly wonder about those unfamiliar with the pattern.

I also wonder if comparing is fair due to the many bad mocks/stubs. But that is another discussion.

Thanks for the article.

@jamesshore

To be clear I mean the bad mocks/stubs in the wild, not in the article. 😅

@DevWouter re: value to consumer, was that rhetorical? I don’t use them in production very often, but they’re occasionally useful when you want to trigger some behavior without doing it “for real.” I recently updated the article with an example of doing this for cache warming. (The example is in the Nullables pattern.)

@jamesshore

It was a honest question and your answer pointed directly to what I missed. Thanks 😊

@jamesshore To aid in my own understanding, I tried to fit this into my own schema of testing approaches, and figure out the major decisions this approach makes. This might be useful feedback, or it might just be repeating the original idea in my own (less coherent) words. Or I might just have misunderstood it all. Let's find out!

@jamesshore Question #1: where do you switch away from using production (whether that be code, data, infrastructure, or something else)? Or, to put it another way, what do we fake? The typical case with mocks is to fake the immediate dependencies with mocks. In the opposite case where we run the full system, it's the config that's fake: the stack of code is completely real but we (for instance) change a config file to point the system at a different database (again, running for real).

@jamesshore Nullables says that we should run as much of our own code as possible, but avoid using real infrastructure. So, we fake out those dependencies on infrastructure at the lowest level e.g. faking out `stdout.write()`.

@jamesshore One possible advantage: if you're mocking immediate dependencies, then you end up mocking an awful lot of your interfaces at one point or another. Faking out dependencies at the lowest level might well mean a much smaller, and therefore more manageable, set of interfaces that you have to fake.

@jamesshore Question #2: how do we make the fakes? Rather using a mocking framework, or making the real thing with different config, we create embedded stubs. One way to think about them is as in-memory implementations of the interface. For instance, instead of writing to stdout, we just stash the value in a field. Instead of checking what was written to stdout, we check the stashed value. Instead of reading from stdin, we get the value to return from a field.

@jamesshore Question #3: who's responsible for constructing the fakes? Or, to put it another way, where does the code that sets up dependencies for tests live? The embedded stub pattern means that all of the code that fakes out an interface is in one place, rather than (for instance) each test mocking an interface and having to correctly fake how it works.

@jamesshore By putting this code in the same file as the production code, it means the knowledge of how the interface is supposed to work is in one place, reducing the risk of inconsistency and improving quality through repeated use.

@jamesshore Similarly, higher level interfaces have functions to create nullable instances in the same file as the functions that create the production instances. So, again, the knowledge of how to create a test instance of X is in one place, which is the same place as X itself, rather than scattered across multiple tests.

@jamesshore Now, I reckon you could pick and choose your answers to these questions. For instance, suppose your default is faking immediate dependencies in the test case using a mocking framework. You could keep using a mocking framework (different answer to #2), but choose to mock the lowest level of infrastructure (same answer to #1) and put all of the code that sets up the mocks (directly or indirectly) in one place (same answer to #3).

@jamesshore Or you could throw away the mocking framework and use in-memory implementations (same answer to #1), but still fake immediate dependencies (different answer to #2) and write a separate implementation in each test case/suite (different answer to #3).

@jamesshore These different combinations come with all sorts of different trade-offs, and some will be more useful than others. Personally, I've gotten a lot of mileage out of making fakes without a mocking framework, and putting the code to set up testable instances of X in the same place as X itself, but varying exactly at what level we do the faking -- sometimes immediate dependencies, sometimes the lowest level of infrastructure, sometimes somewhere in the middle.

@jamesshore I often find that "somewhere in the middle" is where the simplest and most stable interface to fake (and therefore the one that leads to less brittle tests with clearer intent) can be found. It's entirely possible that this is an artefact of poor design choices on my part though!

@jamesshore Anyway, hopefully I haven't entirely misunderstood the idea, and thanks for sharing it!

@zwobble Thanks for engaging so deeply! It’s a real gift to an author.

@zwobble This is a great summary, thanks. Anyone else reading this who’s interested in my Testing Without Mocks / stuff, this might help you wrap your brain around it.

@zwobble Not exactly right. They’re stubs, not fakes, which means they don’t try to provide an in-memory implementation. Instead, they just return pre-configured values, no matter what is written. It’s technically possible to write a fake, rather than a stub, but stubs are much easier and have been sufficient in my experience. With one exception: my Clock infrastructure wrapper uses a fake, not a stub.

@zwobble Also, the output tracking is unrelated to the stubs, and works the same regardless of whether you’re using a Nulled instance or a normal one.

@jamesshore To check I've understood: "works the same" in that you can always create an output tracker using the exact same mechanism, both in production and tests, even if you don't actually do so in practice in production?

One of the other nice things about colocating code in that way is that it makes it more natural to add those sorts of affordances that make testing easier.