Integrations
Integration testing: the layer most teams skip
Quick answer: integration tests cover the layer between your code and external systems — APIs, webhooks, queues, databases. Most teams have unit tests on internal functions and (sometimes) end-to-end tests on the UI, but skip the integration layer where most production failures actually happen. The patterns that work: contract tests against live sandboxes, recorded fixtures for replay, and a small ground-truth smoke suite that runs against production daily.
There’s a familiar testing pyramid: unit tests at the bottom, integration tests in the middle, end-to-end tests at the top. Most software organisations are reasonably good at the bottom and the top. Almost none are good at the middle.
That gap is where a meaningful share of production incidents live — not because integration testing is hard, but because nobody owns it.
Why integration testing gets skipped
Integration testing sits in an awkward place. Unit testing is fast, cheap, and well-supported by every framework. End-to-end testing is visible — it produces screenshots and videos and demo-able outputs. Integration testing produces neither.
The specific reasons we see teams skip it:
- It’s slow. Hitting a real (or sandboxed) external API is orders of magnitude slower than calling a local function.
- It’s flaky. Network issues, rate limits, sandbox availability — tests fail for reasons that aren’t about your code. Teams give up.
- Sandboxes are inconsistent. Some platforms have great sandboxes (Stripe). Some have terrible ones (most). Some have none.
- It’s nobody’s job. Front-end engineers test their UI. Back-end engineers test their functions. The integration layer between back-end and external systems doesn’t have a clear owner.
- The cost of failure isn’t felt at commit time. A broken integration manifests as a customer-facing incident weeks later, by which time the original change is forgotten.
The result: the layer where most integrations actually break has the weakest test coverage of any layer in most codebases.
What “integration testing” actually means here
For clarity: in this article, “integration test” means a test that exercises the boundary between your code and an external system. This includes:
- Tests that hit an external API (live or sandboxed)
- Tests that send/receive webhooks
- Tests that publish to / consume from a message queue
- Tests that read/write a database (especially when transactions span multiple statements)
- Tests that exercise the contract between your service and another internal service
What it doesn’t mean: tests that exercise multiple internal modules together (those are usually called “component tests” or “multi-unit tests”).
Pattern 1: contract tests against sandboxes
For any API integration you depend on, the first layer of integration testing is a small set of tests that hit the vendor’s sandbox environment.
The tests don’t need to cover every code path. They cover the contract:
- Authentication still works
- The endpoints we use still exist and return the shape we expect
- The fields we read still exist
- The error responses still match the documented format
- Pagination still behaves as expected
These tests run in CI, ideally on every change to the integration code, and on a daily schedule even when no changes are happening. The point isn’t to catch regressions in your code — it’s to catch the day the vendor changed theirs.
A well-built sandbox-backed contract test suite is 30–100 tests covering the surface area of your integration. It runs in a few minutes. It catches API contract drift before it reaches production.
Pattern 2: recorded fixtures for replay
For tests that need to run fast, frequently, or in environments without network access (CI for forks, local development without sandbox credentials), the pattern is recorded fixtures.
A library like VCR (Ruby), VCR.py (Python), or Polly.js (Node) records real API interactions to disk. Tests replay the recordings.
The pattern:
- Run the test against the live sandbox once, with recording enabled. The interaction is saved.
- Subsequent test runs replay from the recording. Fast, deterministic, works offline.
- Periodically, re-record (a daily or weekly job that runs against the sandbox and updates the fixtures).
This gives you the speed of unit tests with the realism of integration tests, while still catching contract drift through the periodic re-record.
The trap: never letting fixtures go stale. A two-year-old fixture is testing a two-year-old API. The re-record cadence is what keeps the suite honest.
Pattern 3: a small ground-truth smoke suite against production
For business-critical integrations, the strongest signal is: does this integration actually work in production right now?
The pattern: a tiny suite (5–20 tests) that runs against production daily, doing harmless reads against real data and verifying the response shape. Any failure pages someone.
What to test in production:
- Authentication: can we still authenticate?
- Read paths: can we still read a known record successfully?
- Webhook health: have we received an expected webhook in the last N hours?
- Reconciliation health: did the last reconciliation job find no unexpected diffs?
What not to test in production:
- Anything that creates, modifies, or deletes data (other than dedicated test records)
- Anything that has a meaningful financial cost per call
- Anything that triggers customer-visible side effects
This pattern catches the failure modes that contract tests can’t: production credentials being rotated, network paths between your infrastructure and the vendor changing, organisational quota being adjusted.
Pattern 4: webhook testing without webhook noise
Testing webhook handlers is awkward because you don’t want to fire test webhooks at production endpoints, and pointing a sandbox at your dev environment requires public URLs.
The patterns that work:
- Webhook payload fixtures. Record a real production webhook payload (with PII redacted) and use it as a fixture. Test the handler by passing the fixture directly, bypassing the network entirely. Covers most of the handler logic.
- Sandbox + tunnel for end-to-end. When you need to test the full flow including signature verification and retry behaviour, use ngrok or a similar tunnel to expose your dev environment, point a sandbox webhook at it, and exercise the full path. Useful occasionally; too slow for routine testing.
- Replay tooling. Most platforms (Stripe, GitHub, Shopify) let you replay past webhooks. Useful for debugging specific historical events.
Pattern 5: testing the failure modes
Most integration test suites cover the happy path. The high-value tests cover the failure modes:
- What happens when the vendor returns a 500?
- What happens when a request times out?
- What happens when the response is missing a field your code depends on?
- What happens when the response includes a field your code doesn’t recognise?
- What happens on retry after a partial success?
- What happens when the rate limit is hit?
- What happens when authentication is rejected?
Each is testable with a mocked or stubbed response. Each is the failure mode that catches real production code by surprise. Writing these tests once is dramatically cheaper than discovering each one in production.
What test coverage looks like in practice
For a typical business-critical integration we ship, the test profile is roughly:
- 60–80% unit tests (transformation logic, validation, business rules) — fast, run on every commit
- 15–25% recorded-fixture integration tests (the contract surface) — medium speed, run on every commit
- 5–15% live-sandbox tests (full path including auth, pagination, errors) — slow, run nightly
- A small smoke suite running daily against production — tiny, alerts on failure
This profile catches the vast majority of regressions in CI, catches contract drift within 24 hours, and catches production-only failures (credentials, networking, organisational changes) within 24 hours of the daily smoke run.
It’s not exotic. It’s also not what most teams have.
Common questions
What is integration testing? Testing the boundary between your code and external systems — APIs, webhooks, message queues, databases. Distinct from unit tests (internal functions) and end-to-end tests (full user flows). The layer where most production failures actually happen.
Why is integration testing hard? External systems are slow, sometimes flaky, and not always testable in sandbox environments. Tests fail for reasons that aren’t about your code, which trains teams to ignore them. The fix is patterns — recorded fixtures, contract tests, production smoke suites — that decouple the cost of testing from the cost of running real systems.
How often should integration tests run? Recorded-fixture tests on every commit. Live-sandbox contract tests at least daily. Production smoke tests at least daily. The cadence depends on how quickly you need to detect contract drift — for a critical integration, even an hour of undetected drift is too long.
Should I test against the live API or a sandbox? Both, ideally. Sandboxes are good for routine testing without affecting real data. Live-API smoke tests against production catch the failure modes (real credentials, real network, real quotas) that sandboxes can’t reproduce.
What’s the difference between integration and end-to-end testing? Integration tests cover individual boundaries (your code ↔ one external system). End-to-end tests cover full user flows that may exercise many integrations together. Both are valuable. Integration tests are typically faster, more focused, and more diagnostic when they fail.
If your integrations are running without integration tests behind them and you’ve been getting away with it, start a project and we’ll do an audit. The cost of building proper test coverage is usually measured in days; the cost of not having it is measured in incidents.
More reading
What AI actually costs to run in production
AI demos are cheap. Production is not. Where the money actually goes when you ship an AI feature, and how to size the engineering investment around the model.
IntegrationsWhy integrations break in production (and what to design for)
Every integration that "just calls an API" eventually breaks. The five places they fail first, and the design patterns that keep them running unattended.
StrategyThe hidden costs of SaaS once your business is established
The per-seat licence is the visible cost. Integration tax, lock-in, configuration drift, and the seat tax at scale are the SaaS costs no one quotes up front.