Boundary-Driven Testing – Harmonic Framework

Published March 2026

A Practitioner-Oriented Articulation

Author: William Christopher Anderson
Date: March 2026
Version: 1.0

Executive Summary

Testing difficulty is architectural evidence. When a component cannot be exercised in isolation, when a unit test requires a running database, when a change in one module breaks tests across a dozen others — the problem is not the tests. It is the structure that the tests are attempting to exercise. The component has no coherent boundary. Its responsibilities are distributed incorrectly. Its dependencies are hidden. Tests fail not because the code is wrong, but because the code was not organized to be observable or controllable in the first place.

Boundary-Driven Testing is the observation that the test spiral — unit, integration, end-to-end, system, user acceptance — is not a testing methodology. It is an architectural map. Each ring of the spiral corresponds to a level of architectural scope, and that scope is determined entirely by where boundaries have been placed. Get the boundaries right and the spiral populates itself: each tier has clear targets, predictable scope, and low maintenance overhead. Get them wrong and the spiral collapses — unit tests become integration tests in disguise, E2E tests become the only reliable safety net, and the entire suite grows expensive while providing diminishing confidence.

The structural models defined in Volatility-Based Decomposition and Experience-Based Decomposition produce boundaries that localize both change and test scope simultaneously. The same line that prevents coupling prevents test contamination. The same role taxonomy that makes components replaceable makes them mockable. The same core use cases that validate structural boundaries generate test scenarios across the full spiral. This is not a coincidence. It is the intended consequence of decomposing correctly.

Abstract

The relationship between architectural structure and testability is direct and bidirectional: clear boundaries produce testable components, and testing difficulty reveals boundary problems. Boundary-Driven Testing articulates this relationship by mapping the test spiral onto the component role taxonomy defined in Volatility-Based Decomposition (VBD) and Experience-Based Decomposition (EBD). Each role — Manager, Engine, Resource Accessor, Utility at the system level; Experience, Flow, Interaction, Utility at the UX level — has a natural test profile determined by its responsibilities, permitted dependencies, and communication rules. Mock placement, test scope, and assertion strategy follow from structural position. The spiral is a structural mirror. Difficulty at any level of the spiral points to a specific class of boundary problem — and to the structural fix.

1. Introduction

The conventional framing of testing as a discipline separate from design produces a particular kind of pain. Teams adopt frameworks, mandate coverage minimums, and write guidelines about what to test. The tests improve. The pain persists. A change to a business rule breaks seventeen tests, most of which are not about business rules. An integration test requires spinning up four services to assert one value. An end-to-end test passes in isolation and fails in CI for reasons nobody can reproduce. More coverage, more pain.

The reframe is simple but consequential: testing is not separate from design. The structure of the system determines what can be tested and how. A component designed around a coherent responsibility, with explicit inputs, explicit outputs, and dependencies passed rather than acquired, is inherently testable. No additional effort is required to make it so. The same structural choices that allow the component to change without cascading effects allow it to be tested without elaborate setup.

The inverse is equally true. A component that cannot be unit-tested without mocking half the system is not badly tested — it is badly structured. The test difficulty is diagnostic. It reveals that the component has absorbed responsibilities it should not have, or that its dependencies are implicit rather than declared, or that the boundary between it and its collaborators has been drawn in the wrong place. Fix the structure; the tests follow. Paper over the structure with more elaborate test scaffolding; the problem remains and compounds.

Boundary-Driven Testing takes the diagnostic function of tests seriously. It maps the test spiral to the structural models defined in VBD and EBD, making explicit which components belong at which level of the spiral and why. It treats mock placement as architectural evidence — you mock at boundaries, and if you are mocking everywhere, you have too many boundaries or they are in the wrong places. And it articulates the consequence of correct decomposition: a test suite that is fast at the base, targeted in the middle, and confident at the top, maintained by the natural structure of the system rather than by constant manual curation.

2. The Spiral Is a Structural Map

The test spiral describes a progression from the narrowest to the broadest scope:

Unit — one component, all dependencies replaced. Fast, numerous, fine-grained.
Integration — component collaboration across one seam, with dependencies mocked at the outer boundary. Verifies that orchestration logic and contracts are wired correctly.
End-to-End — a complete user flow, full stack and browser. Verifies that the system behaves correctly from the outside.
System — the integrated system under realistic conditions: load, failure injection, configuration variation. Verifies non-functional qualities.
User Acceptance — real users or proxies confirm that what was built matches what was intended.

The common reading of the spiral is proportional: many unit tests, fewer integration tests, fewer still E2E, and so on. This is a useful heuristic, but it obscures the more important principle. The spiral is not primarily about proportion — it is about scope. Unit scope is a single component. Integration scope is a collaboration across one boundary. E2E scope is a complete journey. System scope is the whole.

Scope is determined by architecture. Where boundaries are placed determines what constitutes a “unit,” what constitutes an “integration,” and what constitutes a “journey.” In a system with no meaningful component boundaries, unit scope and system scope are the same thing — there is nothing below the full system that can be isolated. The spiral collapses into E2E by default, because E2E is the only level at which you can exercise anything coherent.

Figure 1a shows the five levels of the spiral from narrowest to broadest scope. Figure 1b shows how each architectural tier attracts tests at a specific level.

Figure 1a — The Test Spiral

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#e0e7ff', 'primaryTextColor': '#1e293b', 'primaryBorderColor': '#4338ca', 'lineColor': '#4338ca', 'background': '#ffffff', 'mainBkg': '#e0e7ff', 'nodeBorder': '#4338ca', 'edgeLabelBackground': '#ffffff'}}}%% flowchart TD UNIT["Unit"] INT["Integration"] E2E["End-to-End"] SYS["System"] UAT["User Acceptance"] UNIT --> INT --> E2E --> SYS --> UAT

Figure 1b — Architectural Tier → Test Level

3. Boundaries Determine Test Profiles

Each component role in VBD and EBD has a characteristic test profile — not assigned arbitrarily, but derived from the role’s structural position, responsibilities, and communication rules.

3.1 Engines — The Unit Test Core

Engines are the most logic-dense tier and the natural home of the unit test suite. An Engine encapsulates business rules: given inputs, apply policy, produce a result. It has no workflow awareness, no sibling dependencies, and no reason to reach outward unless it needs data from a Resource Accessor — which it receives through an explicit, mockable interface.

This is what makes Engines straightforwardly testable. Mock the Accessor, supply controlled inputs, assert on the output. The Engine’s communication constraints — no peer Engine calls, no direct infrastructure access — ensure there is nothing else to mock. The test scope is exactly the Engine and nothing more.

For Flows in EBD the same principle applies. A Flow receives shared state from the Experience, steps through Interactions, makes a backend call, and emits a completion event. Mock the backend call, simulate Interaction events through a test harness, and assert on what the Flow emits. The Flow’s rule against calling sibling Flows means the unit scope stays tight.

3.2 Resource Accessors — Thin Boundary, Minimal Unit Surface

Accessors sit at the system’s external boundary, and that position defines what they are responsible for testing — which is less than it might appear. An Accessor’s job is translation: convert a domain request into an external call, convert the response back. Whether the external system is reachable, whether it is correctly provisioned, whether it performs within acceptable bounds — none of these are Accessor concerns. They are infrastructure concerns, and they belong to system testing and deployment verification.

If an Accessor contains meaningful translation or mapping logic, that logic can be unit tested by controlling the inputs and outputs through the Accessor’s own interface. But the Accessor has no business connecting to a real database in a unit or integration test. It either connects or it doesn’t — and that is a system-level fact, not a test target. The Accessor’s correctness is about the translation. The infrastructure’s correctness is about the infrastructure.

3.3 Integration Tests — The Three Seams That Matter

Integration tests in a VBD system are not about any single component in isolation. They are about the seams between roles — verifying that the contracts components depend on are honored, and that orchestration logic matches the design. Everything is still mocked at the external boundary. Real external systems do not enter the picture until E2E.

The distinction from unit tests is scope, not realism. A unit test exercises one component against mocked dependencies. An integration test exercises the collaboration between two components against mocked dependencies at the outer edge. You care whether the wiring is correct — not whether the database is running.

There are three seams worth testing at this level:

Manager → Engine. Does the Manager invoke the Engine with the correct inputs? Does it handle every state the Engine’s contract can emit — success, domain failure, unexpected error — and route accordingly? The Engine is mocked. Feed it controlled responses representing each state it might return. Verify that the Manager’s orchestration logic handles all of them correctly. A mock Engine returning a validation failure is just as useful as a real one — what you are testing is the Manager’s response, not the Engine’s behavior.

Engine → Resource Accessor. Does the Engine correctly use the Accessor’s contract? Does it correctly interpret the states the Accessor can return? The Accessor is mocked. No database is involved. You are testing whether the Engine handles the Accessor’s interface correctly — not whether the Accessor connects to anything.

Manager → Resource Accessor. Managers sometimes interact with Accessors directly — for reads that inform orchestration decisions, or for state persistence the Manager owns. Test these paths the same way: mock the Accessor, exercise the Manager’s handling of every response state the Accessor’s contract defines.

In EBD, the equivalent seam is Experience → Flow: does the Experience pass correct shared state, handle Flow completion and skip signals, and advance the journey correctly? The backend is mocked. You are verifying journey composition logic — not backend behavior.

3.4 Interactions and Utilities — Narrow and Fast

Interactions are atomic. They render, receive user input, and emit events. They carry no flow logic, make no API calls, and have no awareness of adjacent components. Component tests — render in a harness, simulate the input event, assert what was emitted — cover them completely and quickly. No mocks are typically needed; props and callbacks are the entire interface.

Utilities are simpler still: inputs in, outputs out, no side effects. Given input X, assert output Y. The only exception is a Utility wrapping an external sink (a log transport, a telemetry exporter), where the sink gets mocked. Everything else is pure function territory.

4. Mock Placement Is Architectural Evidence

Where you place mocks tells you where your boundaries are. Where you are forced to place mocks tells you where your boundaries should be.

The rule is simple: mock at the role boundary, not inside the role. Each component role has one natural mock point — the interface at which it hands off to the next tier. Mock that interface and nothing else.

Figure 2 shows where mocks belong at each test level. Each diagram is independent — together they cover the full VBD test surface.

Figure 2a — Engine unit test: mock the Resource Accessor, keep the Engine real.

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#e0e7ff', 'primaryTextColor': '#1e293b', 'primaryBorderColor': '#4338ca', 'lineColor': '#4338ca', 'background': '#ffffff', 'mainBkg': '#e0e7ff', 'nodeBorder': '#4338ca', 'clusterBkg': '#f5f3ff', 'edgeLabelBackground': '#ffffff'}}}%% flowchart TD ENG["Engine (real)"] ACC["Resource Accessor (mock)"] ENG -->|"calls"| ACC

Figure 2b — Accessor unit test: mock the data source, keep the Accessor real.

Figure 2c — Integration tests: the three seams. Each seam tests one collaboration. Dependencies at the outer edge are mocked — no real external systems.

When a unit test requires mocking more than the single boundary below the component under test, something is wrong. Either the component has absorbed responsibilities that belong at a different tier, or its dependencies are implicit rather than injected, or an Accessor is missing and the component is reaching directly into infrastructure it should not see. Mock proliferation is always a structural signal — not a testing problem, and not a problem that better mocking frameworks solve.

The inverse is equally worth examining. An Engine or Flow unit test that requires no mocks is either genuinely pure-computation (rare and fine) or is only exercising the easy path through logic that silently delegates to collaborators the test never reaches. Coverage numbers tell you how many lines ran. They do not tell you whether the logic that matters was actually exercised.

5. The Same Scenarios Validate Architecture and Tests

VBD and EBD both use core scenarios as architectural validation mechanisms. A core use case in VBD — Process an order, Evaluate eligibility, Onboard a new customer — should be traceable through the component hierarchy without bypassing communication rules. A core user journey in EBD — Complete developer onboarding, Publish a prism, Discover a workspace — should trace through Experience → Flow → Interaction without boundary leakage.

These scenarios are also the test scenarios that matter most. Not because coverage demands it, but because scenarios that validate structural boundaries naturally exercise the most load-bearing code paths, the most significant collaborations, and the most complete representations of what the system is actually for.

Figures 3a, 3b, and 3c show the same order-processing scenario at three levels of the spiral. Each level asks a different question. Each has a different scope.

Figure 3a — Unit: each Engine in isolation

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#e0e7ff', 'primaryTextColor': '#1e293b', 'primaryBorderColor': '#4338ca', 'lineColor': '#4338ca', 'background': '#ffffff', 'actorBkg': '#e0e7ff', 'actorBorder': '#4338ca', 'activationBkg': '#ede9fe', 'activationBorderColor': '#7c3aed', 'signalColor': '#4338ca', 'signalTextColor': '#1e293b', 'noteBkgColor': '#fef3c7', 'noteBorderColor': '#d97706', 'noteTextColor': '#1e293b'}}}%% sequenceDiagram participant T as Test participant VE as ValidationEngine participant PE as PricingEngine Note over T,VE: All Accessor dependencies mocked per Engine T->>VE: validateOrder(missingField) VE-->>T: ValidationFailure ✓ T->>VE: validateOrder(validOrder) VE-->>T: ValidationPass ✓ T->>PE: calculatePrice(highVolumeOrder) PE-->>T: PricedOrder(tierDiscount) ✓ T->>PE: calculatePrice(promotionOrder) PE-->>T: PricedOrder(promotionApplied) ✓

Figure 3b — Integration: Manager orchestration with mocked dependencies

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#e0e7ff', 'primaryTextColor': '#1e293b', 'primaryBorderColor': '#4338ca', 'lineColor': '#4338ca', 'background': '#ffffff', 'actorBkg': '#e0e7ff', 'actorBorder': '#4338ca', 'activationBkg': '#ede9fe', 'activationBorderColor': '#7c3aed', 'signalColor': '#4338ca', 'signalTextColor': '#1e293b', 'noteBkgColor': '#fef3c7', 'noteBorderColor': '#d97706', 'noteTextColor': '#1e293b'}}}%% sequenceDiagram participant T as Test participant OM as OrderManager participant VE as ValidationEngine participant PE as PricingEngine participant OR as OrderRepository Note over VE,OR: All mocked — return controlled contract states Note over T,OM: Path 1 — validation failure T->>OM: submitOrder(invalidOrder) OM->>VE: validate(order) VE-->>OM: ValidationFailure OM-->>T: Rejected — PE and OR never called ✓ Note over T,OM: Path 2 — happy path T->>OM: submitOrder(validOrder) OM->>VE: validate(order) VE-->>OM: ValidationPass OM->>PE: calculatePrice(order) PE-->>OM: PricedOrder OM->>OR: saveOrder(pricedOrder) OR-->>OM: orderId OM-->>T: OrderConfirmation ✓

Figure 3c — E2E: full stack, real systems

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#e0e7ff', 'primaryTextColor': '#1e293b', 'primaryBorderColor': '#4338ca', 'lineColor': '#4338ca', 'background': '#ffffff', 'actorBkg': '#e0e7ff', 'actorBorder': '#4338ca', 'activationBkg': '#ede9fe', 'activationBorderColor': '#7c3aed', 'signalColor': '#4338ca', 'signalTextColor': '#1e293b', 'noteBkgColor': '#fef3c7', 'noteBorderColor': '#d97706', 'noteTextColor': '#1e293b'}}}%% sequenceDiagram actor User participant API as API Endpoint participant OM as OrderManager participant VE as ValidationEngine participant PE as PricingEngine participant OR as OrderRepository Note over User,OR: No mocks — real stack, real data store User->>API: POST /orders API->>OM: submitOrder(request) OM->>VE: validate(order) VE-->>OM: ValidationPass OM->>PE: calculatePrice(order) PE-->>OM: PricedOrder OM->>OR: saveOrder(pricedOrder) OR-->>OM: orderId OM-->>API: OrderConfirmation API-->>User: 201 Created Note over User: Assert order visible in downstream system ✓

The same scenario, three questions:

Unit — Does ValidationEngine correctly reject a missing field? Does PricingEngine apply the right discount? One Engine, all its rule branches, mocks for anything it depends on.
Integration — Does OrderManager route correctly when validation fails? Does it call the right dependencies in the right order when it succeeds? Engines and Accessor are mocked — you are testing the Manager’s orchestration logic against every contract state its dependencies can emit.
End-to-End — Does an order submitted through the real API surface in the system correctly? No mocks. Real behavior, real infrastructure, real assertion.
User Acceptance — Does a stakeholder placing an order through the application experience the outcome they expected?

Each level asks a different question about the same scenario. Each question corresponds to a structural scope. The test suite is not organized around coverage targets — it is organized around the architecture.

When a scenario cannot be cleanly decomposed this way — when the unit tests would require mocking the Manager, or the integration tests have no obvious boundary to stop at — the scenario is exposing an architectural gap. The fix is structural. Tests that are uncomfortable to write at a given level signal that the corresponding structural tier is missing or muddled.

6. UAT Validates What Architecture Validated First

User acceptance testing is often treated as a separate world from the structural concerns of the preceding spiral levels. Stakeholders exercise the system. They are not concerned with Managers, Engines, or Flows. They care whether the product works as intended.

But the core user journeys that structure EBD — and the core use cases that structure VBD — are precisely the scenarios that UAT exercises. The developer onboarding journey that validates EBD structural boundaries is the same journey that a UAT participant walks through. The order processing scenario that exercises VBD communication rules is the same scenario a business stakeholder confirms in acceptance.

This alignment is not accidental. Both architectural validation and UAT begin from the same question: does the system fulfill its core purpose correctly? Architectural validation asks it structurally — can this scenario be traced without boundary violations? UAT asks it experientially — does this scenario produce the correct outcome for a real user?

When the architecture is sound, the answers converge. Structural boundaries support real journeys without friction. UAT scenarios map cleanly onto the E2E scenarios that confirm those journeys in the automated suite. The test spiral closes: the same scenarios that entered at the unit level — as isolated assertions on Engine logic — emerge at the UAT level as confirmed product behavior.

When they diverge — when UAT surfaces scenarios that have no corresponding structural representation, or when E2E tests cover journeys that no stakeholder actually cares about — both the tests and the architecture need reexamination.

7. Diagnostic Signals

Testing difficulty is a signal. The nature of the difficulty points to the specific structural problem.

Mock proliferation — a unit test mocking more than one or two dependencies is usually exercising something that spans too many concerns. The component should be decomposed, or its dependencies should be consolidated behind a single interface.

Slow unit tests — unit tests that require real I/O (network, filesystem, database) are not unit tests. Something that should be an Accessor is embedded in an Engine or Flow. Extract it.

Brittle E2E tests — E2E tests that break for reasons unrelated to user-visible behavior are coupled to implementation detail. Either the test is asserting on internal component state (stop), or the flow being tested has no stable boundary (fix the architecture).

Inverted pyramid — when the E2E suite is larger than the unit suite because unit tests cannot cover meaningful scenarios, the unit-testable tiers (Engines, Flows) contain less logic than they should. Business logic has migrated into Managers or Accessors.

UAT surprises — when UAT surfaces behaviors that no automated test predicted, either the core scenario set is incomplete or the structural model does not reflect how the product is actually used. Both are architectural discoveries, not testing failures.

7A. Practitioner Observations

The following observations emerge from applying Boundary-Driven Testing across multiple systems of varying scale and domain. They are not prescriptive rules but recurring patterns — structural phenomena that practitioners encounter once the spiral is treated as an architectural map rather than a coverage checklist.

The Test Migration Pattern

When architecture improves — when an Engine is extracted from a Manager, or an Accessor is separated from inline infrastructure calls — tests naturally migrate from integration scope to unit scope. Logic that previously could only be exercised through the Manager’s orchestration can now be tested directly against the extracted Engine. The test count at each spiral level is therefore a leading indicator of structural health. A system with a growing unit test count and a shrinking integration test count is decomposing correctly: logic is moving into testable tiers. A system where unit tests plateau while integration tests multiply is accumulating orchestration logic in the wrong places. Tracking this ratio over time reveals architectural trajectory more reliably than any static metric.

The Mock Boundary Audit

Periodically reviewing where mocks are placed across the test suite reveals architectural drift before it becomes visible in production behavior. When a unit test that once required a single mock — the Accessor interface below the Engine — now requires three mocks, a boundary has leaked. The Engine has acquired a dependency it should not have, or a new collaborator has been introduced without going through the established interface. The audit is mechanical: list every mock in every unit test, group by component under test, and compare to the expected mock count for that component’s role. Engines should mock Accessors. Managers should mock Engines and Accessors. Utilities should mock nothing or at most one external sink. Deviations from this pattern are not judgment calls — they are structural findings that point to specific refactoring targets.

The E2E Stability Correlation

E2E test stability correlates directly with Experience and Manager stability. When E2E tests become flaky — passing on one run, failing on the next, sensitive to timing or environment — the instability almost always traces to the orchestration tier rather than to infrastructure. The Manager or Experience is making decisions that depend on transient state, or it is sequencing operations in a way that is sensitive to timing that unit and integration tests never exercise. Stable Managers produce stable E2E tests. When E2E flakiness spikes, the first diagnostic step is not to add retries or increase timeouts — it is to examine what changed in the orchestration layer. The E2E suite is a Manager health monitor.

The Coverage Paradox

Teams with high line coverage but poor boundary coverage consistently have worse defect rates than teams with moderate line coverage and strong boundary coverage. The explanation is structural: line coverage rewards exercising code paths, but many code paths are internal to a component and exercise only the easy branches. Boundary coverage — ensuring that every contract state a dependency can emit is handled by the caller — exercises the load-bearing logic: error handling, fallback paths, state transitions that only occur when a collaborator returns something unexpected. A team at 90% line coverage that never tests what happens when the Accessor returns a timeout has covered the lines but missed the boundary. A team at 65% line coverage that tests every Accessor contract state — success, not-found, timeout, malformed response — has covered less code but exercised far more of what matters. What matters is exercising contract states, not line counts.

Accelerated Test Review

BDT makes test review faster and more consistent because reviewers have a structural question to ask rather than a subjective one. Instead of evaluating whether “enough” is tested or whether the test “looks right,” a reviewer checks that the test targets the correct spiral level for the component’s role. Is this an Engine? Then the test should be a unit test with mocked Accessors. Is this a Manager? Then the test should be an integration test exercising orchestration against mocked Engines. Is this a new E2E scenario? Then it should trace a complete user journey through the real stack. The review question shifts from “is this a good test?” to “is this test at the right level?” — a question with a definitive answer derived from the component’s structural position. Review time drops because the evaluation criteria are objective.

Natural CI Pipeline Mapping

BDT interacts with CI pipelines by providing a principled mapping between spiral levels and pipeline stages. Unit tests run on every commit — they are fast, numerous, and catch logic regressions immediately. Integration tests run on pull request — they verify that the collaboration contracts between components are intact before code enters the shared branch. E2E tests run on merge to main — they confirm that the full system behaves correctly before deployment. System tests run in staging — they exercise non-functional qualities under realistic conditions. The spiral maps to the pipeline stages naturally because each stage has a different latency tolerance and a different confidence target. Teams that adopt BDT often find that their pipeline stage definitions, which previously felt arbitrary, now have a structural justification: each stage corresponds to a scope, and each scope corresponds to a tier.

The Diagnostic Cascade

A failing E2E test should be reproducible as a failing integration test at one specific seam. If it is, the defect is localized: the collaboration between two components at that seam is broken, and the integration test pinpoints which contract state is mishandled. If it is not — if the E2E test fails but all integration tests pass — then one of two things is true. Either the E2E test is coupled to implementation detail that no integration test targets (the test is wrong), or the integration test suite is missing a contract state that the real system exercises (the suite is incomplete). The diagnostic cascade — E2E failure, then integration reproduction, then unit isolation — is how BDT converts a symptom into a structural finding. Each level of the cascade narrows the scope. A defect that survives to E2E without appearing at integration is always an architectural discovery about a missing seam or an untested contract state.

The Refactor Safety Observation

When a system is structured according to VBD and tested according to BDT, internal refactoring within a component — changing how an Engine computes a result, optimizing an Accessor’s query strategy, restructuring a Manager’s orchestration sequence — is protected by the correct test level automatically. Refactoring an Engine’s internals is covered by its unit tests. Changing an Accessor’s implementation is covered by its unit tests against the mocked data source. Altering a Manager’s orchestration sequence is covered by its integration tests against mocked Engines. The key observation is that tests at the boundary do not care about internal restructuring — they care about the contract. This means that well-placed boundary tests provide refactoring confidence without requiring test updates for internal changes. When a refactor requires updating tests at multiple spiral levels, it is not a refactor — it is a contract change, and the test updates are the correct response.

8. Conclusion

The spiral is not a burden. It is a reflection.

A correctly decomposed system — one where Engines contain logic and Accessors contain I/O and Managers contain orchestration and Utilities contain nothing domain-specific — produces a test suite that follows naturally from its structure. Unit tests are fast because Engines and Flows have no hidden dependencies. Integration tests are targeted because Accessors have narrow, stable interfaces. E2E tests are confident because Experiences and Managers represent complete, semantically meaningful journeys. UAT aligns because those journeys were designed to represent actual human purpose.

The work of creating a good test suite is mostly the work of creating a good architecture. The spiral does not impose additional design constraints — it reads off the constraints already imposed by correct decomposition. When those constraints are satisfied, testing is an expression of the structure. When they are violated, testing is a fight against it.

Boundary-Driven Testing names that relationship precisely. Boundaries determine test scope. Boundaries define mock points. Boundaries generate test profiles. Fix the boundaries and the spiral follows.

Appendix A: Glossary

Architecturally Significant Boundary — A seam between components with distinct roles, such as Engine-Accessor or Flow-API Accessor. The correct and stable placement for test doubles.

Boundary-Driven Testing — A test strategy in which architectural boundaries between component roles determine test scope, mock placement, and assertion targets. Derived from VBD/EBD decomposition rather than imposed as an external testing framework.

Contract State — The set of possible return types a component can emit across its boundary, including success values, domain errors, and infrastructure failures. Tests must exercise every contract state to ensure the caller handles all outcomes.

Controllability — The ability to place a component into a known state without invoking the full system. Required for isolation.

End-to-End Test — A test that exercises a full user-visible journey through the real stack with no mocks. In BDT, E2E tests target Manager and Experience components to verify that assembled paths produce correct outcomes against live dependencies.

Integration Test — A test that verifies coordination between components at the Manager or Experience level, with dependencies mocked at the outer architectural boundary. In BDT, integration tests confirm that orchestration wiring matches the intended design.

Inverted Pyramid — A test suite dominated by E2E tests because unit and integration tests cannot exercise meaningful behavior. A structural indicator of misplaced boundaries, not a testing indicator.

Mock — A test double placed at an architecturally significant boundary to isolate the component under test from its dependencies. In BDT, mocks are positioned exclusively at role boundaries, not at arbitrary internal call sites.

Mock Proliferation — A test requiring many simultaneous mocks. A signal that the component crosses too many boundaries or has absorbed too many responsibilities.

Observability — The ability to determine what a component did given controlled inputs. Required for meaningful assertion.

Refactoring Confidence — The assurance that internal changes to a component are safe so long as its boundary tests continue to pass. BDT provides this by anchoring tests to stable architectural seams rather than internal implementation details.

Seam — The point between two component roles where a mock is placed during testing. Seams exist at architecturally significant boundaries and represent the natural isolation surface for test doubles.

Structural Signal — Testing difficulty — such as excessive mocking, brittle assertions, or unclear scope — that reveals architectural misalignment rather than a testing tooling problem. BDT treats these signals as prompts to fix boundaries, not tests.

System Test — A test that targets non-functional qualities such as resilience, performance, and deployment correctness. System tests use failure injection, load generation, and configuration variation against the full assembled system.

Test Profile — The characteristic test shape of a component role, specifying what is tested, what is mocked, and what is asserted. Each VBD/EBD role (Engine, Accessor, Manager, Utility, etc.) has a distinct test profile determined by its boundary relationships.

Test Spiral — A progression of test scope from unit through integration, end-to-end, system, and user acceptance. Each level corresponds to a level of architectural scope, and the spiral shape emerges naturally from correct boundary placement.

Unit Test — A test that exercises the internal logic of a single component — typically an Engine, Flow, or Utility — with all dependencies mocked at their architectural boundaries. In BDT, unit tests verify logic correctness and full contract-state coverage.

User Acceptance Test — A test performed by real users against the live system to verify that core journeys fulfill business intent. UATs use no mocks and validate that the system delivers actual value, not just technical correctness.

Appendix B: BDT at a Glance

Spiral Level	Primary Structural Target	Mock Strategy	Assertion Scope
Unit	Engine · Flow · Utility · Interaction	Mock all dependencies through their contracts	Logic correctness; all contract states handled
Integration	Manager · Experience coordination	Mock at outer boundary; dependencies return controlled contract states	Collaboration wiring; orchestration matches design
End-to-End	Manager · Experience	No mocks; full stack; real external systems	User-visible outcome; journey completion
System	Full system	Failure injection; load; configuration variation	Non-functional qualities; resilience; deployment correctness
UAT	Core use case / Core user journey	None (real users)	Business intent fulfilled

Appendix C: Case Study — E-Commerce Order Processing

This appendix presents a fictional but structurally realistic example of BDT applied to an e-commerce order processing system. The architecture follows VBD role assignments. The test suite follows the spiral. The example demonstrates how boundary placement determines test scope, mock strategy, and maintenance cost — and how adding a new capability requires minimal test changes when the boundaries are correct.

C.1 System Architecture

The order processing system is decomposed into the following components:

OrderManager — orchestrates the order lifecycle: validates, prices, checks inventory, processes payment, persists, and notifies. Contains no business logic. Sequences Engine calls and handles cross-cutting routing.
ValidationEngine — applies order validation rules: required fields, item availability constraints, customer eligibility, promotion expiration.
PricingEngine — calculates order totals: base pricing, volume discounts, promotional codes, tax computation, currency handling.
InventoryEngine — determines fulfillment feasibility: stock checks, reservation logic, partial fulfillment decisions, backorder policy.
OrderRepositoryAccessor — translates domain order objects to and from the persistent store. Handles serialization, query construction, and connection management.
PaymentGatewayAccessor — translates payment requests into gateway-specific API calls. Handles authentication, request formatting, response parsing, and error translation.
NotificationAccessor — translates notification requests into delivery channel calls (email, SMS, push). Handles template selection and delivery confirmation.
LoggingUtility — structured log emission. Pure sink: accepts log entries, formats them, writes to the configured transport. No domain awareness.

C.2 Unit Test Suite

Each Engine is tested in isolation. Accessors below each Engine are mocked. The tests exercise every branch of the Engine’s logic by controlling the inputs and the mock responses.

ValidationEngine Unit Tests

Test Scenario	Input	Mock Setup	Expected Result
Missing required field (customer email)	Order with null email	None needed	`ValidationFailure("missing_required_field", "email")`
Empty line items	Order with zero items	None needed	`ValidationFailure("empty_order", null)`
Expired promotional code	Order with promo code “SUMMER2025”	OrderRepositoryAccessor returns promo with `expires: 2025-09-01`	`ValidationFailure("expired_promotion", "SUMMER2025")`
Out-of-stock item (validation-level check)	Order with item SKU-9912	OrderRepositoryAccessor returns `stock_count: 0` for SKU-9912	`ValidationFailure("item_unavailable", "SKU-9912")`
Customer account suspended	Order with customer ID 4401	OrderRepositoryAccessor returns customer with `status: suspended`	`ValidationFailure("customer_ineligible", "account_suspended")`
Valid order, all checks pass	Complete order, all fields present	OrderRepositoryAccessor returns valid promo, positive stock, active customer	`ValidationPass(order)`

PricingEngine Unit Tests

Test Scenario	Input	Mock Setup	Expected Result
Volume discount threshold (100+ units)	Order with 150 units of SKU-1001	OrderRepositoryAccessor returns tier pricing: 100+ at 15% discount	`PricedOrder(discount_applied: "volume_15pct")`
Promotional code (percentage)	Order with promo “SAVE20”	OrderRepositoryAccessor returns promo: 20% off, no exclusions	`PricedOrder(promo_applied: "SAVE20", discount: 20%)`
Promotional code with exclusion	Order with promo “SAVE20”, item in exclusion list	OrderRepositoryAccessor returns promo with exclusion list containing item SKU	`PricedOrder(promo_applied: null, exclusion_reason: "item_excluded")`
Tax calculation (multi-jurisdiction)	Order shipping to CA	OrderRepositoryAccessor returns CA tax rate 8.25%	`PricedOrder(tax_rate: 8.25%, tax_amount: computed)`
Combined volume + promo (stacking rules)	150 units with promo “SAVE20”	OrderRepositoryAccessor returns stacking policy: “best_single”	`PricedOrder(discount_applied: "volume_15pct", stacking: "best_single_applied")`
Zero-cost order (full discount)	Order fully covered by store credit	None needed	`PricedOrder(total: 0.00, payment_required: false)`

InventoryEngine Unit Tests

Test Scenario	Input	Mock Setup	Expected Result
Full stock available	Order for 10 units, 50 in stock	OrderRepositoryAccessor returns `available: 50`	`InventoryReserved(full, reservation_id)`
Partial stock, backorder allowed	Order for 10 units, 3 in stock	OrderRepositoryAccessor returns `available: 3`, backorder policy: allowed	`InventoryPartial(available: 3, backordered: 7, reservation_id)`
Partial stock, backorder disallowed	Order for 10 units, 3 in stock	OrderRepositoryAccessor returns `available: 3`, backorder policy: disallowed	`InventoryInsufficient(available: 3, required: 10)`
Multi-item reservation (atomic)	Order for 3 SKUs	OrderRepositoryAccessor returns stock for all 3	`InventoryReserved(full, reservation_ids: [r1, r2, r3])`
Reservation timeout recovery	Order for 10 units	OrderRepositoryAccessor returns reservation then timeout on confirm	`InventoryError("reservation_timeout", reservation_id)`

Mock placement for all Engine unit tests:

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#e0e7ff', 'primaryTextColor': '#1e293b', 'primaryBorderColor': '#4338ca', 'lineColor': '#4338ca', 'background': '#ffffff', 'mainBkg': '#e0e7ff', 'nodeBorder': '#4338ca', 'clusterBkg': '#f5f3ff', 'edgeLabelBackground': '#ffffff'}}}%% flowchart TD subgraph unit_scope["Unit Test Scope"] VE["ValidationEngine (real)"] PE["PricingEngine (real)"] IE["InventoryEngine (real)"] end subgraph mocked["Mocked at Boundary"] ORA["OrderRepositoryAccessor (mock)"] end VE -->|"reads"| ORA PE -->|"reads"| ORA IE -->|"reads/writes"| ORA

C.3 Integration Test Suite

Integration tests exercise the seams between tiers. Each seam is tested with the caller real and the callee mocked, returning controlled contract states. The question is not whether the callee works — that is answered by its unit tests — but whether the caller handles every state the callee can emit.

Seam: OrderManager to ValidationEngine

Test Scenario	Mock Response from ValidationEngine	Expected Manager Behavior
Validation passes	`ValidationPass(order)`	Manager proceeds to PricingEngine
Validation fails (missing field)	`ValidationFailure("missing_required_field", "email")`	Manager returns rejection, no downstream calls made
Validation fails mid-order (concurrent modification)	`ValidationFailure("order_modified_concurrently", order_id)`	Manager returns conflict error, logs warning, no payment attempted
ValidationEngine throws unexpected error	`RuntimeException("database_unavailable")`	Manager returns system error, logs critical, no downstream calls

Seam: OrderManager to PricingEngine

Test Scenario	Mock Response from PricingEngine	Expected Manager Behavior
Pricing succeeds	`PricedOrder(total: 142.50)`	Manager proceeds to InventoryEngine
Pricing returns zero total	`PricedOrder(total: 0.00, payment_required: false)`	Manager skips PaymentGatewayAccessor, proceeds to persist
PricingEngine returns error	`PricingError("tax_service_unavailable")`	Manager returns pricing failure, no inventory reservation attempted

Seam: OrderManager to InventoryEngine

Test Scenario	Mock Response from InventoryEngine	Expected Manager Behavior
Full inventory reserved	`InventoryReserved(full, reservation_id)`	Manager proceeds to payment
Partial inventory, backorder	`InventoryPartial(available: 3, backordered: 7)`	Manager proceeds to payment with adjusted total, notifies customer of partial fulfillment
Inventory insufficient	`InventoryInsufficient(available: 3, required: 10)`	Manager returns inventory failure, no payment attempted

Seam: OrderManager to PaymentGatewayAccessor

Test Scenario	Mock Response from PaymentGatewayAccessor	Expected Manager Behavior
Payment authorized	`PaymentAuthorized(transaction_id)`	Manager persists order, sends confirmation notification
Payment declined	`PaymentDeclined("insufficient_funds")`	Manager releases inventory reservation, returns payment failure
Payment gateway timeout	`PaymentTimeout(retry_after: 30)`	Manager holds reservation, returns retry-eligible error
Payment gateway returns unknown error	`PaymentError("unknown_gateway_error")`	Manager releases reservation, logs critical, returns system error

Mock placement for integration tests:

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#e0e7ff', 'primaryTextColor': '#1e293b', 'primaryBorderColor': '#4338ca', 'lineColor': '#4338ca', 'background': '#ffffff', 'mainBkg': '#e0e7ff', 'nodeBorder': '#4338ca', 'clusterBkg': '#f5f3ff', 'edgeLabelBackground': '#ffffff'}}}%% flowchart TD subgraph integration_scope["Integration Test Scope"] OM["OrderManager (real)"] end subgraph mocked_engines["Engines (mocked — return controlled states)"] VE["ValidationEngine (mock)"] PE["PricingEngine (mock)"] IE["InventoryEngine (mock)"] end subgraph mocked_accessors["Accessors (mocked — return controlled states)"] ORA["OrderRepositoryAccessor (mock)"] PGA["PaymentGatewayAccessor (mock)"] NA["NotificationAccessor (mock)"] end OM --> VE OM --> PE OM --> IE OM --> ORA OM --> PGA OM --> NA

C.4 End-to-End Test Scenarios

E2E tests exercise the full stack with no mocks. Real databases, real payment gateway (sandbox mode), real notification delivery (test channel). The assertions target user-visible outcomes.

Scenario 1: Happy Path Order

Submit a valid order through the API with two line items and a promotional code.
Assert: API returns 201 Created with an order confirmation containing the order ID, applied discount, and estimated delivery.
Assert: Order is retrievable via GET /orders/{id} with status confirmed.
Assert: Inventory counts for both items are decremented.
Assert: Payment transaction appears in gateway sandbox with correct amount.
Assert: Confirmation notification delivered to test channel.

Scenario 2: Order with Payment Failure

Submit a valid order through the API using a test card number that triggers decline.
Assert: API returns 402 Payment Required with decline reason.
Assert: Order is retrievable via GET /orders/{id} with status payment_failed.
Assert: Inventory reservations are released (stock counts restored).
Assert: No confirmation notification sent.

Scenario 3: Partial Inventory Fulfillment

Submit a valid order for 10 units where only 4 are in stock and backorder is allowed.
Assert: API returns 201 Created with order confirmation indicating partial fulfillment.
Assert: Order contains two fulfillment groups: 4 units immediate, 6 units backordered.
Assert: Payment charged for full amount (backorder policy: charge upfront).
Assert: Customer receives notification indicating partial shipment with backorder ETA.

C.5 Adding a New Payment Method

This section demonstrates the maintenance cost of adding a new capability — a cryptocurrency payment option — to the system. Because boundaries are correctly placed, the change is contained.

What changes:

PaymentGatewayAccessor — Add a new translation path for cryptocurrency gateway API calls. The Accessor already defines the contract: authorize(payment_request) -> PaymentAuthorized | PaymentDeclined | PaymentTimeout | PaymentError. The new payment method is a new implementation path within the Accessor, not a new contract.
One new Accessor unit test — Test that the Accessor correctly translates a cryptocurrency payment request into the gateway’s API format and correctly parses the response. The mock target is the cryptocurrency gateway’s HTTP client. The contract states are identical: authorized, declined, timeout, error.
One updated integration test at the Manager-to-Accessor seam — Add a test case confirming that the OrderManager correctly handles a cryptocurrency payment authorization. The mock PaymentGatewayAccessor returns PaymentAuthorized(transaction_id, method: "crypto"). Assert that the Manager persists the order with the correct payment method recorded.
One new E2E scenario — Submit an order through the API with payment_method: "crypto", using the cryptocurrency gateway sandbox. Assert the same outcomes as the happy path: order confirmed, inventory decremented, notification sent, payment recorded with the correct method.

What does not change:

ValidationEngine unit tests — validation rules are payment-method-agnostic.
PricingEngine unit tests — pricing is independent of payment method.
InventoryEngine unit tests — inventory logic is independent of payment method.
All existing integration tests — the Manager’s orchestration logic is unchanged; it calls PaymentGatewayAccessor.authorize() regardless of method.
All existing E2E scenarios — the happy path, payment failure, and partial fulfillment scenarios are unaffected.

Change impact diagram:

This is the structural payoff of correct boundary placement. The new payment method is volatile — it is a new external integration with its own protocol, authentication, and error semantics. But the volatility is contained entirely within the Accessor, which is the role designed to absorb external integration change. The Engines do not know about payment methods. The Manager does not know about gateway protocols. The test suite reflects this containment: three new tests, zero modified tests, full confidence.

References and Influences

William Christopher Anderson
Anderson, William Christopher. Volatility-Based Decomposition in Software Architecture. February 2026. vbd.md
Anderson, William Christopher. Experience-Based Decomposition. March 2026. ebd.md

VBD and EBD define the structural models and role taxonomies that BDT maps to the test spiral. The component roles (Manager, Engine, Resource Accessor, Utility; Experience, Flow, Interaction, Utility), communication rules, and core scenario validation mechanisms are taken from these sources. BDT establishes the testability consequences of those structures: where to test each role, what to mock, and how to read testing difficulty as structural signal.

David L. Parnas
Parnas, David L. “On the Criteria To Be Used in Decomposing Systems into Modules.” Communications of the ACM, 1972.

Parnas argued that modules should hide design decisions likely to change behind stable interfaces. That same abstraction is what makes components testable: the stable interface is what callers depend on, and what tests can target without coupling to implementation. BDT extends this from the module level to the architectural level, applying the same principle across the full component role hierarchy.

Robert C. Martin
Martin, Robert C. Clean Architecture. Pearson, 2017.

Martin’s work on dependency inversion, boundary placement, and the Dependency Rule establishes the structural conditions under which testing is tractable. His principle — mock across architecturally significant boundaries, but not within them — is adopted directly in Section 4. His observation that testability is a primary benefit of correct dependency management is the foundational premise of BDT.

Martin Fowler
Fowler, Martin. “The Practical Test Pyramid.” martinfowler.com, 2018.

Fowler’s test pyramid established the proportional framing — many unit tests, fewer integration tests, fewer still E2E — and the practical consequences of inverting it. BDT builds on this by grounding the pyramid’s levels in structural tiers rather than convention, explaining why the proportions emerge from correct decomposition rather than treating them as a design rule to follow.

Gregor Hohpe and Bobby Woolf
Hohpe, Gregor; Woolf, Bobby. Enterprise Integration Patterns. Addison-Wesley, 2003.

Hohpe and Woolf’s treatment of integration points as volatility points reinforces the argument that Accessors are the natural integration test target. Their accessor and adapter patterns define the narrow, stable interfaces that integration tests verify — and that unit tests mock. The patterns also describe the specific behaviors (retries, error translation, protocol handling) that integration tests must exercise and unit tests must exclude.

Juval Löwy
Löwy, Juval. Righting Software. Addison-Wesley, 2019.

Löwy’s IDesign methodology is the source of the Manager-Engine-Accessor taxonomy. The communication rules and role constraints that prevent runtime coupling also prevent test complexity — an observation that BDT makes explicit. The discipline that keeps Engines from coordinating workflows keeps unit tests focused. The discipline that keeps Accessors from applying business rules keeps integration tests targeted.

Author’s Note

Boundary-Driven Testing does not introduce new testing techniques. Every test type described here — unit, integration, E2E, system, UAT — predates this paper by decades. What BDT contributes is a structural account of why these levels exist, what they correspond to in a correctly decomposed system, and how to read testing difficulty as a diagnostic signal about structural health.

The intent is a reference that makes explicit the relationship between decomposition discipline and testing tractability — suitable for engineering onboarding, test strategy discussions, and architectural review in organizations building products intended to last.

Distribution Note

This document is provided for informational and educational purposes. It may be shared internally within organizations, used as a reference in testing and architecture discussions, or adapted for non-commercial educational use with appropriate attribution.