Integrations

Integration testing: the layer most teams skip

Andrew Roper · 1 Mar 2026 · 6 min read

Quick answer: integration tests cover the layer between your code and external systems — APIs, webhooks, queues, databases. Most teams have unit tests on internal functions and (sometimes) end-to-end tests on the UI, but skip the integration layer where most production failures actually happen. The patterns that work: contract tests against live sandboxes, recorded fixtures for replay, and a small ground-truth smoke suite that runs against production daily.

There’s a familiar testing pyramid: unit tests at the bottom, integration tests in the middle, end-to-end tests at the top. Most software organisations are reasonably good at the bottom and the top. Almost none are good at the middle.

That gap is where a meaningful share of production incidents live — not because integration testing is hard, but because nobody owns it.

Why integration testing gets skipped

Integration testing sits in an awkward place. Unit testing is fast, cheap, and well-supported by every framework. End-to-end testing is visible — it produces screenshots and videos and demo-able outputs. Integration testing produces neither.

The specific reasons we see teams skip it:

It’s slow. Hitting a real (or sandboxed) external API is orders of magnitude slower than calling a local function.
It’s flaky. Network issues, rate limits, sandbox availability — tests fail for reasons that aren’t about your code. Teams give up.
Sandboxes are inconsistent. Some platforms have great sandboxes (Stripe). Some have terrible ones (most). Some have none.
It’s nobody’s job. Front-end engineers test their UI. Back-end engineers test their functions. The integration layer between back-end and external systems doesn’t have a clear owner.
The cost of failure isn’t felt at commit time. A broken integration manifests as a customer-facing incident weeks later, by which time the original change is forgotten.

The result: the layer where most integrations actually break has the weakest test coverage of any layer in most codebases.

What “integration testing” actually means here

For clarity: in this article, “integration test” means a test that exercises the boundary between your code and an external system. This includes:

Tests that hit an external API (live or sandboxed)
Tests that send/receive webhooks
Tests that publish to / consume from a message queue
Tests that read/write a database (especially when transactions span multiple statements)
Tests that exercise the contract between your service and another internal service

What it doesn’t mean: tests that exercise multiple internal modules together (those are usually called “component tests” or “multi-unit tests”).

Pattern 1: contract tests against sandboxes

For any API integration you depend on, the first layer of integration testing is a small set of tests that hit the vendor’s sandbox environment.

The tests don’t need to cover every code path. They cover the contract:

Authentication still works
The endpoints we use still exist and return the shape we expect
The fields we read still exist
The error responses still match the documented format
Pagination still behaves as expected

These tests run in CI, ideally on every change to the integration code, and on a daily schedule even when no changes are happening. The point isn’t to catch regressions in your code — it’s to catch the day the vendor changed theirs.

A well-built sandbox-backed contract test suite is 30–100 tests covering the surface area of your integration. It runs in a few minutes. It catches API contract drift before it reaches production.

Pattern 2: recorded fixtures for replay

For tests that need to run fast, frequently, or in environments without network access (CI for forks, local development without sandbox credentials), the pattern is recorded fixtures.

A library like VCR (Ruby), VCR.py (Python), or Polly.js (Node) records real API interactions to disk. Tests replay the recordings.

The pattern:

Run the test against the live sandbox once, with recording enabled. The interaction is saved.
Subsequent test runs replay from the recording. Fast, deterministic, works offline.
Periodically, re-record (a daily or weekly job that runs against the sandbox and updates the fixtures).

This gives you the speed of unit tests with the realism of integration tests, while still catching contract drift through the periodic re-record.

The trap: never letting fixtures go stale. A two-year-old fixture is testing a two-year-old API. The re-record cadence is what keeps the suite honest.

Pattern 3: a small ground-truth smoke suite against production

For business-critical integrations, the strongest signal is: does this integration actually work in production right now?

The pattern: a tiny suite (5–20 tests) that runs against production daily, doing harmless reads against real data and verifying the response shape. Any failure pages someone.

What to test in production:

Authentication: can we still authenticate?
Read paths: can we still read a known record successfully?
Webhook health: have we received an expected webhook in the last N hours?
Reconciliation health: did the last reconciliation job find no unexpected diffs?

What not to test in production:

Anything that creates, modifies, or deletes data (other than dedicated test records)
Anything that has a meaningful financial cost per call
Anything that triggers customer-visible side effects

This pattern catches the failure modes that contract tests can’t: production credentials being rotated, network paths between your infrastructure and the vendor changing, organisational quota being adjusted.

Pattern 4: webhook testing without webhook noise

Testing webhook handlers is awkward because you don’t want to fire test webhooks at production endpoints, and pointing a sandbox at your dev environment requires public URLs.

The patterns that work:

Webhook payload fixtures. Record a real production webhook payload (with PII redacted) and use it as a fixture. Test the handler by passing the fixture directly, bypassing the network entirely. Covers most of the handler logic.
Sandbox + tunnel for end-to-end. When you need to test the full flow including signature verification and retry behaviour, use ngrok or a similar tunnel to expose your dev environment, point a sandbox webhook at it, and exercise the full path. Useful occasionally; too slow for routine testing.
Replay tooling. Most platforms (Stripe, GitHub, Shopify) let you replay past webhooks. Useful for debugging specific historical events.

Pattern 5: testing the failure modes

Most integration test suites cover the happy path. The high-value tests cover the failure modes:

What happens when the vendor returns a 500?
What happens when a request times out?
What happens when the response is missing a field your code depends on?
What happens when the response includes a field your code doesn’t recognise?
What happens on retry after a partial success?
What happens when the rate limit is hit?
What happens when authentication is rejected?

Each is testable with a mocked or stubbed response. Each is the failure mode that catches real production code by surprise. Writing these tests once is dramatically cheaper than discovering each one in production.

What test coverage looks like in practice

For a typical business-critical integration we ship, the test profile is roughly:

60–80% unit tests (transformation logic, validation, business rules) — fast, run on every commit
15–25% recorded-fixture integration tests (the contract surface) — medium speed, run on every commit
5–15% live-sandbox tests (full path including auth, pagination, errors) — slow, run nightly
A small smoke suite running daily against production — tiny, alerts on failure

This profile catches the vast majority of regressions in CI, catches contract drift within 24 hours, and catches production-only failures (credentials, networking, organisational changes) within 24 hours of the daily smoke run.

It’s not exotic. It’s also not what most teams have.

Common questions

What is integration testing? Testing the boundary between your code and external systems — APIs, webhooks, message queues, databases. Distinct from unit tests (internal functions) and end-to-end tests (full user flows). The layer where most production failures actually happen.

Why is integration testing hard? External systems are slow, sometimes flaky, and not always testable in sandbox environments. Tests fail for reasons that aren’t about your code, which trains teams to ignore them. The fix is patterns — recorded fixtures, contract tests, production smoke suites — that decouple the cost of testing from the cost of running real systems.

How often should integration tests run? Recorded-fixture tests on every commit. Live-sandbox contract tests at least daily. Production smoke tests at least daily. The cadence depends on how quickly you need to detect contract drift — for a critical integration, even an hour of undetected drift is too long.

Should I test against the live API or a sandbox? Both, ideally. Sandboxes are good for routine testing without affecting real data. Live-API smoke tests against production catch the failure modes (real credentials, real network, real quotas) that sandboxes can’t reproduce.

What’s the difference between integration and end-to-end testing? Integration tests cover individual boundaries (your code ↔ one external system). End-to-end tests cover full user flows that may exercise many integrations together. Both are valuable. Integration tests are typically faster, more focused, and more diagnostic when they fail.

If your integrations are running without integration tests behind them and you’ve been getting away with it, start a project and we’ll do an audit. The cost of building proper test coverage is usually measured in days; the cost of not having it is measured in incidents.

Integration testing: the layer most teams skip

Why integration testing gets skipped

What “integration testing” actually means here

Pattern 1: contract tests against sandboxes

Pattern 2: recorded fixtures for replay

Pattern 3: a small ground-truth smoke suite against production

Pattern 4: webhook testing without webhook noise

Pattern 5: testing the failure modes

What test coverage looks like in practice

Common questions

What AI actually costs to run in production

Why integrations break in production (and what to design for)

The hidden costs of SaaS once your business is established

The right system,
built once, properly.

Integration testing: the layer most teams skip

Why integration testing gets skipped

What “integration testing” actually means here

Pattern 1: contract tests against sandboxes

Pattern 2: recorded fixtures for replay

Pattern 3: a small ground-truth smoke suite against production

Pattern 4: webhook testing without webhook noise

Pattern 5: testing the failure modes

What test coverage looks like in practice

Common questions

What AI actually costs to run in production

Why integrations break in production (and what to design for)

The hidden costs of SaaS once your business is established

The right system, built once, properly.

The right system,
built once, properly.