Published March 2026
Process-Driven Agent Execution with Unbounded Local Memory
Author: William Christopher Anderson
Date: March 2026
Version: 1.0
Executive Summary
Large language models are stateless. Every call begins from nothing. The entire burden of continuity — what happened before, what matters now, what the system has learned — falls on whatever context is stuffed into the prompt window. Today’s agent systems respond to this constraint with brute force: they pack as much raw text as possible into every call, hope the model attends to the right parts, and accept that the model forgets everything between sessions.
This approach is simultaneously expensive and unreliable. It is expensive because every token sent to the model incurs cost, and most of those tokens are irrelevant to the current task. It is unreliable because the model has no mechanism to distinguish signal from noise in a bloated context window — the important instruction on line 400 competes for attention with the boilerplate on line 12.
The Compiled Context Runtime (CCR) is an architectural model that eliminates both problems. It introduces three structural innovations:
-
Process definitions — Agent workflows codified as versioned, executable YAML specifications. Each process declares its steps, gates, knowledge requirements, and trigger conditions. The agent’s creativity goes into executing the steps, not remembering them.
-
Compiled context injection — A compilation pipeline that retrieves relevant knowledge, compresses it into a lossless format (CTX), and injects only what is needed for the current process step. The context window receives precision-compiled packages, not raw text dumps.
-
Memory and context chains — Persistent, linked data structures in a local database that capture the full history of agent interactions, decisions, corrections, and execution outcomes. Chains compile into CTX packages on demand, giving the model access to effectively unlimited historical depth while staying within the token window.
The consequence is a system where the context window is no longer a hard limit. It becomes a viewport — a precision-scoped lens into a local store of potentially millions of memories, thousands of execution records, and hundreds of thousands of embeddings. The model sees exactly what it needs for the current step. Nothing more. Nothing less.
The economic implications are significant. By reducing input tokens per task by approximately 88% and eliminating exploratory calls through deterministic process execution, the CCR model cuts LLM API costs by an order of magnitude. At enterprise scale, this represents millions of dollars in annual savings per organization. At global scale — across the hundreds of millions of knowledge workers, analysts, researchers, writers, and developers adopting LLM-assisted workflows — the aggregate savings exceed billions of dollars annually.
This paper describes the architectural model, the compilation pipeline, the memory system, the learning loop that makes processes and context progressively more efficient, and the economic analysis that quantifies the impact.
Abstract
Current approaches to LLM-based agent systems treat the context window as a fixed-size container into which raw text is packed before each inference call. This produces three systemic failures: excessive token cost from irrelevant context, unreliable model behavior from attention dilution, and complete memory loss between sessions. The Compiled Context Runtime addresses these failures through process-driven execution (codified workflows that eliminate prompt-dependent behavior), compiled context injection (a pipeline that retrieves, compresses, and scopes knowledge to the current step), and persistent memory chains (linked data structures that give the model access to unbounded historical depth through precision compilation). This paper presents the architectural model, the compilation format, the memory and context chain data structures, the process discovery and refinement loop, and a quantitative analysis of token economics at individual, enterprise, and global scale. The system is local-first by design: all data — process definitions, execution history, knowledge embeddings, compiled context packages — resides on the user’s machine. No workflow data crosses a network boundary except the compiled context injected into the LLM inference call itself.
1. Introduction
1.1 The Statelesness Problem
Large language models are functions. They accept a sequence of tokens and produce a sequence of tokens. They retain nothing between calls. Every inference begins from a blank state, and whatever continuity the system exhibits must be constructed entirely from the input context.
This is a fundamental architectural constraint, and the industry’s response to it has been remarkably uniform: pack more into the context window. Conversation history is appended. Retrieval-augmented generation (RAG) inserts document fragments. System prompts grow to thousands of tokens of instructions. The result is a context window that serves simultaneously as instruction manual, conversation log, knowledge base, and working memory — a single undifferentiated buffer asked to do the work of four distinct systems.
The consequences are predictable. Important instructions are buried among retrieved passages. Relevant history competes with irrelevant history for the model’s attention. Token costs scale linearly with the amount of context stuffed into each call, regardless of how much of that context is actually used. And when the session ends, everything is lost.
1.2 The Agent Amplification
Agent systems amplify every failure mode. An agent is not a single inference call — it is a sequence of calls, each building on the last, often spanning hours of work. An agent reviewing a pull request might make twenty calls: reading files, understanding context, analyzing changes, composing feedback. At each call, the agent system must reconstruct the relevant context from scratch, because the model remembers nothing from the previous call.
The common solution is to carry forward the entire conversation history. This means that call twenty contains the full transcript of calls one through nineteen — most of which is irrelevant to the current task of composing a final review comment. The token cost of the twentieth call dwarfs its informational content.
More critically, the agent has no structured memory. It cannot recall what it learned three sessions ago. It cannot look up a decision it made last week. It cannot walk a chain of related corrections to understand the current state of a preference. Every session begins from whatever fits in the system prompt, and everything else is gone.
1.3 The Compiled Context Alternative
The Compiled Context Runtime (CCR) inverts the relationship between the model and its context. Instead of the context window being a container that the system fills, it becomes a viewport that the runtime controls.
The runtime maintains three independent systems:
- A process engine that defines agent workflows as executable specifications, eliminating the need for the model to remember what to do
- A compilation pipeline that transforms raw knowledge into compressed, scoped packages, eliminating the need to stuff raw text into the context
- A memory system that persists, links, and indexes every interaction across sessions, eliminating the assumption that the model must forget
These three systems compose to produce a model of agent execution where the context window is used surgically — receiving only what the current step requires — while the actual depth of available context is limited only by local storage.
1.4 Model-Agnostic by Construction
The CCR is not coupled to any specific language model. Compiled CTX packages are plain text — any model that accepts text input can consume them. Process definitions are YAML — they describe what to do, not how any particular model should do it. Memory chains are data structures — they store and retrieve knowledge independently of which model uses it.
Critically, the model is not statically configured — it is dynamically selected. When a step in a process needs execution, the runtime evaluates the task requirements (reasoning depth, code generation, speed constraints, data sensitivity), checks available models and their capabilities, and selects the optimal model for that specific step. The process definition does not say “use Claude” or “use GPT” — it describes the work, and the runtime matches the work to the best available model. This means:
- Dynamic model selection — The agent evaluates each task, checks what models are available and what they’re good at, and picks the right one. A complex architectural decision routes to the most capable reasoning model. A simple file transformation routes to a fast, cheap model. A step handling sensitive data routes to a local model that never leaves the machine. This happens automatically, per-step, without human intervention.
- Cross-model intelligence — Because knowledge lives in compiled context packages and memory chains — not in any model’s weights — intelligence accumulates across model boundaries. A decision made by Claude gets recorded in a memory chain. That memory chain gets compiled into context for a step executed by GPT. The insight transfers. The intelligence is in the data layer, and every model that touches it gets smarter.
- Survive model obsolescence — When a better model launches, the CCR’s accumulated knowledge, processes, and execution history carry forward unchanged. Nothing is lost to a model transition. The new model immediately benefits from everything every previous model learned, because it’s all in the compiled context.
- No vendor lock-in — The value accrues in the local data layer (processes, memories, knowledge), not in the model. The model is a replaceable inference endpoint. The intelligence is in the compiled context. Switch providers, switch models, switch architectures — the accumulated intelligence persists.
1.5 Local-First as Architectural Requirement
The CCR model is local-first by design, not by preference. This is an architectural requirement, not a deployment choice.
Process definitions encode an organization’s workflows. Execution history records what an agent has done and learned. Memory chains capture every decision, correction, and preference accumulated over months of use. Knowledge embeddings index proprietary content, internal documentation, and domain-specific reference material.
None of this data should cross a network boundary. It is operationally sensitive, competitively valuable, and privacy-critical. The only data that leaves the user’s machine is the compiled context package injected into the LLM inference call — and that package contains only what the current step requires, compiled into a format that strips structural metadata.
Local-first is what makes the system trustworthy. If the memory system required shipping data to a cloud service, adoption would be structurally limited to organizations willing to externalize their workflows. Local-first removes that constraint entirely.
2. The Five Primitives
2.1 The Execution Cycle
Before defining how processes are represented, the CCR establishes the fundamental cycle that governs all agent work. Every action an agent takes is an instance of one of five primitives, executed in cycle:
-
Orchestrate — Invoke meta-learning. Pull the latest state. Read the knowledge index. Look up relevant knowledge by topic. Compile context. Analyze dependencies. Decompose the task. Dispatch.
-
Execute — Do the work. Write code, configure systems, run tests, produce artifacts. This is the only primitive that produces external output.
-
Learn — Analyze outcomes at two levels:
- Meta-learning: Evaluate the processes themselves — execution patterns, recovery strategies, failure modes. Update directives and process definitions.
-
Context-learning: Evaluate the domain — what was discovered about the subject matter, the working environment, the user’s preferences. Update knowledge and memory chains.
-
Build — Create new processes, knowledge artifacts, or tools when Learn identifies gaps. A repeated ad-hoc sequence becomes a process definition. A missing knowledge topic becomes a new entry. A missing capability becomes a new tool.
-
Refine — Improve existing processes, knowledge, and tools when Learn identifies weaknesses. A slow step gets optimized. A stale knowledge reference gets updated. A process gate that fails too often gets its preconditions adjusted.
The cycle: Orchestrate → Execute → Learn → Build/Refine (if needed) → Orchestrate (better)
2.2 Why Five Primitives
The five primitives are not arbitrary. They are the minimal set required for a self-improving execution system:
- Without Orchestrate, the agent has no context and works blind.
- Without Execute, no work is produced.
- Without Learn, the agent repeats mistakes and never improves.
- Without Build, gaps in processes and knowledge persist indefinitely.
- Without Refine, existing processes degrade as conditions change.
Remove any one and the system loses a critical capability. Add a sixth and it can be expressed as a composition of the existing five. The primitives are orthogonal and complete.
2.3 Processes Formalize the Cycle
Every process definition in the CCR is a codification of the five primitives applied to a specific workflow:
- The process’s knowledge references and gates are the Orchestrate phase — ensuring context is loaded and preconditions are met before work begins.
- The process’s steps are the Execute phase — the actual work, performed in sequence.
- The process’s execution recording is the Learn phase — capturing what happened for later analysis.
- The process discovery system is the Build phase — detecting new patterns and proposing new process definitions.
- The process refinement system is the Refine phase — analyzing execution records and proposing improvements.
The five primitives are the theory. Process definitions are the implementation. The CCR makes the cycle explicit, executable, and self-improving.
3. Process Definitions
3.1 Processes as Data, Not Prompts
The first structural innovation of the CCR is the separation of workflow definition from workflow execution.
In conventional agent systems, the workflow lives in the prompt. A system prompt might instruct the agent: “First, check CI status. Then read the failing test. Then fix the test. Then run the test suite. Then commit.” The agent follows these instructions — if it attends to them, if they fit in the context window, if it doesn’t hallucinate an alternative sequence.
In the CCR, the workflow is a data structure:
process: fix_ci_failure
version: 3
trigger:
type: event
match:
source: ci
status: failure
knowledge:
- engineering.testing
- project.ci_pipeline
gates:
- execution_context_exists
- branch_clean
steps:
- id: read_failure
action: read_ci_log
description: Identify the failing test and error message
- id: locate_source
action: find_relevant_code
description: Find the source code responsible for the failure
- id: diagnose
action: analyze_failure
description: Determine root cause of the failure
- id: implement_fix
action: write_code
description: Implement the fix
- id: verify
action: run_tests
description: Run the test suite to verify the fix
- id: commit
action: commit_and_push
description: Commit the fix and push
gates:
- tests_pass
This definition is stored in a database, versioned, and executable. The runtime reads it and executes each step in sequence. The model is invoked at each step with exactly the context that step requires — not a prompt full of instructions it might or might not follow.
3.2 Gates
Gates are preconditions evaluated before execution begins or before individual steps execute. They are binary — pass or fail — and their failure halts the process with a recorded reason.
Gates serve two purposes. First, they prevent the agent from executing in invalid states — attempting to commit when tests are failing, or beginning work without an execution context. Second, they create a verifiable execution contract. A process with three gates and six steps produces a deterministic sequence of checkpoints that can be audited after the fact.
3.3 Knowledge References
Each process declares which knowledge topics it needs. The runtime resolves these references against the knowledge store before execution begins. This is not retrieval-augmented generation — it is declarative context scoping. The process author specifies exactly what the model should know for this workflow. The runtime compiles it. The model receives it.
This eliminates the two failure modes of RAG: retrieving irrelevant passages (because the process author specified exactly what’s needed) and missing relevant passages (because the knowledge references are explicit and verified at process definition time).
3.4 Process Inheritance and Composition
Process definitions are object-oriented. A process can extend another process, inheriting its steps, gates, and knowledge references while overriding or adding to them. This is structural inheritance — the same concept as class inheritance in Java or C#, applied to workflow definitions.
process: fix_ci_failure_with_notification
version: 1
extends: fix_ci_failure
# Inherits all steps, gates, knowledge from fix_ci_failure
# Adds a notification step after commit
steps:
- inherit: all
- id: notify
action: send_notification
description: Notify the team that the CI failure has been fixed
after: commit
# Adds additional knowledge ref
knowledge:
- inherit: all
- team.notification_preferences
The inheritance model supports:
- Single inheritance — A process extends exactly one parent. The parent’s steps, gates, and knowledge references are inherited unless explicitly overridden.
- Step override — A child process can replace a parent step by declaring a step with the same ID. The parent’s version is discarded; the child’s version is used.
- Step insertion — A child can insert steps before or after inherited steps using
before:andafter:directives. The parent’s sequence is preserved; the child’s additions are spliced in. - Gate extension — A child inherits all parent gates and can add additional gates. Gates cannot be removed — a child process is always at least as constrained as its parent.
- Knowledge extension — Knowledge references compose. A child inherits all parent knowledge and can add more. This ensures the child always has at least as much context as the parent.
- Abstract processes — A process can be declared
abstract: true, meaning it cannot be executed directly but serves as a template for concrete processes. This is the process equivalent of an abstract class.
# Abstract base process — cannot execute directly
process: standard_code_change
abstract: true
version: 1
gates:
- execution_context_exists
- branch_clean
knowledge:
- engineering.pull_request
- project.code_conventions
steps:
- id: analyze
action: analyze_requirements
abstract: true # Must be overridden by child
- id: implement
action: write_code
abstract: true # Must be overridden by child
- id: verify
action: run_tests
- id: commit
action: commit_and_push
gates:
- tests_pass
Concrete processes extend this base:
process: fix_bug
extends: standard_code_change
version: 1
steps:
- id: analyze
action: read_bug_report
description: Identify root cause from bug report and logs
- id: implement
action: write_fix
description: Implement the minimal fix
---
process: add_feature
extends: standard_code_change
version: 1
knowledge:
- inherit: all
- engineering.design_review
steps:
- id: analyze
action: read_feature_spec
description: Understand the feature requirements
- id: implement
action: write_feature
description: Implement the feature with tests
This is polymorphism applied to workflows. A standard_code_change defines the contract — what gates must pass, what knowledge is loaded, what sequence is followed. Concrete processes fill in the domain-specific behavior. The runtime doesn’t care whether it’s executing fix_bug or add_feature — it executes the linked process, step by step, through the same pipeline.
3.5 Process Interfaces
Just as object-oriented systems separate interface from implementation, the CCR separates process contracts from process implementations. A process interface defines what a process must do — its required steps, gates, and knowledge references — without specifying how.
interface: code_change
version: 1
description: Contract for any process that modifies code
required_gates:
- execution_context_exists
- branch_clean
required_steps:
- id: analyze
description: Understand what needs to change
- id: implement
description: Make the change
- id: verify
description: Verify the change works
required_knowledge:
- engineering.pull_request
Any process that declares implements: code_change must provide concrete definitions for all required steps. The compiler verifies this at compile time — a process that claims to implement an interface but is missing a required step fails to compile.
process: fix_bug
version: 1
implements: code_change
# Compiler verifies: analyze, implement, verify steps all present
# Compiler verifies: execution_context_exists, branch_clean gates present
# Compiler verifies: engineering.pull_request in knowledge refs
steps:
- id: analyze
action: read_bug_report
description: Identify root cause from bug report and logs
- id: implement
action: write_fix
description: Implement the minimal fix
- id: verify
action: run_tests
description: Run the test suite
Process interfaces enable:
- Substitutability — Any process implementing the
code_changeinterface can be used where acode_changeis expected. The runtime can dynamically select which concrete process to execute based on the trigger event, the project context, or user preference. - Contract verification — The compiler guarantees that every implementing process satisfies the interface contract. Missing steps, missing gates, missing knowledge references are compile-time errors.
- Organizational standards — An organization defines process interfaces that encode their standards: “every code change must include analysis, implementation, and verification.” Teams provide concrete implementations that fit their specific workflows. The interface ensures consistency; the implementation allows flexibility.
- Composability — A process can implement multiple interfaces, satisfying multiple contracts simultaneously. A
deploy_hotfixprocess might implement bothcode_changeanddeployment, ensuring it meets the standards for both workflows.
This is the Interface Segregation Principle applied to processes. Interfaces are small, focused contracts. Processes implement the ones relevant to their domain. The compiler enforces the contracts. The runtime dispatches polymorphically.
3.6 The Process Compiler
Process definitions are not interpreted — they are compiled. The compilation pipeline is analogous to class loading in the JVM or assembly loading in the CLR: YAML source is parsed, validated, linked, and emitted as an executable runtime object.
Compilation stages:
-
Parse — YAML source is deserialized into a raw ProcessDefinition AST (abstract syntax tree). Syntax errors are caught here — malformed YAML, missing required fields, invalid types.
-
Validate — The AST is validated against the process schema. Semantic errors are caught: duplicate step IDs, circular inheritance, references to nonexistent gates, abstract steps that aren’t overridden, knowledge references that don’t resolve. Validation produces a list of errors and warnings. A process with errors cannot proceed to linking. Warnings are recorded but do not block compilation.
-
Resolve inheritance — If the process extends a parent, the compiler loads the parent (recursively, for chains of inheritance), merges inherited steps/gates/knowledge with the child’s overrides, and verifies that all abstract steps have been implemented.
-
Link — Symbolic references are resolved to concrete objects. Knowledge topic names are resolved to file paths. Gate names are bound to evaluator functions. Step actions are bound to handler callables. The result is a
LinkedProcess— an object where every reference is a direct pointer, not a name to be looked up at runtime. This is the process equivalent of a linked executable. -
Emit — The LinkedProcess is registered in the process table and cached. It is ready for execution. The compiled form is stored alongside the source YAML, so recompilation is only needed when the source changes.
Compile-time guarantees:
Because processes are validated at compile time, the runtime can make guarantees that interpreted systems cannot:
- Every knowledge reference resolves to a real file
- Every gate references a registered evaluator
- Every step action references a registered handler
- Inheritance chains are acyclic
- Abstract steps are fully implemented
- No duplicate step IDs exist
- Required fields are present and correctly typed
A process that compiles will not fail due to structural errors at runtime. Runtime failures are limited to actual execution issues — a test that fails, a file that’s missing, an API that’s down. The structural integrity is guaranteed by the compiler.
3.7 Versioning and Evolution
Every modification to a process creates a new version. Execution records link to the version that was active at execution time. This produces a complete audit trail: which version of which process produced which outcome, with which knowledge references, at which time.
Version history enables the refinement loop described in Section 8.
4. The Runtime
4.1 A Managed Runtime for Agent Processes
The Compiled Context Runtime is a managed runtime in the same sense as the JVM or the CLR. It is not a script runner — it is a full execution environment that manages the lifecycle of process objects, provides memory management with garbage collection, implements multi-level caching, offers observability through tracing and debugging, and is extensible through a messaging bus.
The analogy is precise:
| JVM/CLR Concept | CCR Equivalent |
|---|---|
| Class | ProcessDefinition (YAML source) |
| Class loader | ProcessLoaderEngine (YAML parse + validate) |
| Linker | ProcessLinkerEngine (resolve refs, bind gates) |
| Loaded class | LinkedProcess (all refs resolved) |
| Object instance | ExecutionRecord (a running/completed execution) |
| Garbage collector | GCManager (generational, mark-sweep) |
| JIT cache | CacheManager (L1/L2/L3 tiered) |
| Class hierarchy | Process inheritance (extends, abstract) |
| Interface | Gate contracts + step action contracts |
| Bytecode verifier | Process validator (compile-time guarantees) |
| Debugger | Execution tracer + step inspector |
| ClassNotFoundException | ProcessLoadError |
| LinkageError | LinkError (unresolved ref) |
4.2 The Caching System
The CCR implements a three-tier cache modeled on CPU cache hierarchies:
L1 — In-Memory Hot Cache. Recently compiled CTX packages, recently linked processes, and recently resolved knowledge topics. Access time: microseconds. Size: bounded by memory (configurable, default 256MB). Eviction policy: adaptive replacement cache (ARC) — balances recency and frequency. This is where the runtime looks first for any compiled artifact.
L2 — SQLite Warm Cache. Compiled artifacts that have been evicted from L1 but are still likely to be needed. Serialized to disk in a SQLite database. Access time: single-digit milliseconds. Size: bounded by disk (configurable, default 2GB). Eviction policy: time-aware LFU — items that haven’t been accessed within a configurable window are evicted. Promotion to L1 occurs on access.
L3 — Cold Storage. Full compilation artifacts archived for historical reference. This tier is not accessed during normal execution — it exists for auditing and recompilation. Items promoted from L3 go to L2 first, then L1 on access.
Cache warming. On startup, the runtime warms the cache by preloading the most frequently used processes and their knowledge references. The warming strategy is derived from execution history — processes executed most often in the last 30 days are preloaded. This means the first execution after startup is nearly as fast as subsequent ones.
4.3 Generational Garbage Collection
The CCR manages a large volume of runtime objects: memory nodes, context chains, execution records, compiled CTX packages, cached compilation artifacts. Not all of these need to persist forever. The generational garbage collector reclaims objects that are no longer reachable, following the same generational hypothesis as the JVM: most objects die young.
Three generations:
-
Gen 0 (Nursery) — Newly created objects: fresh memory nodes, in-progress execution records, temporary CTX compilations. Collected frequently (every N allocations or every M minutes). Most objects die here — a temporary compilation for a single step is used once and discarded.
-
Gen 1 (Survivor) — Objects that survived one or more Gen 0 collections. These have demonstrated some persistence — a memory node that’s been referenced by another node, an execution record that’s been finalized, a CTX package that’s been accessed multiple times. Collected less frequently.
-
Gen 2 (Tenured) — Long-lived objects: established memory chains, frequently-accessed knowledge packages, historical execution records marked for retention. Collected rarely. Objects in Gen 2 are the permanent knowledge base — the accumulated expertise described in Section 6.
Collection algorithm: Mark-sweep with reference counting. The collector identifies root objects (active execution contexts, pinned memory chains, cached processes), traces all reachable objects from roots, and sweeps unreachable objects. Reference counts provide fast detection of isolated garbage; the full mark-sweep handles cycles.
Promotion criteria: An object is promoted from Gen N to Gen N+1 when it survives a configurable number of collections (default: 2 for Gen 0→1, 5 for Gen 1→2). Objects can also be explicitly promoted (pinned) by the user or by the runtime when they’re referenced by a long-lived chain.
4.4 Observability
A runtime without observability is a black box. The CCR provides full instrumentation for debugging, tracing, and monitoring:
Execution tracing. Every process execution produces a trace — a structured record of every step executed, every gate evaluated, every knowledge reference resolved, every CTX package compiled, every model invocation made, and every outcome recorded. Traces are linked to execution contexts and stored in the execution record. They can be inspected after the fact to understand exactly what happened and why.
Step-level debugging. The runtime supports breakpoints at the step level. A step can be marked as a breakpoint in the process definition or at runtime. When a breakpoint step is reached, execution pauses, and the current state is surfaced: the compiled context that would be injected, the gate results, the execution history so far. The user can inspect, modify context, or resume.
Structured logging. All runtime events are emitted as structured log entries with correlation IDs that link to the active execution context. Log levels: TRACE (every internal operation), DEBUG (compilation and linking details), INFO (step execution, gate results), WARN (non-fatal issues), ERROR (step failures, gate failures).
Metrics. The runtime exposes metrics for monitoring:
– Cache hit rates per tier (L1/L2/L3)
– GC pause times and collection counts per generation
– Compilation times (parse, validate, link, emit)
– Token usage per step and per process
– Execution duration per step
– Model selection decisions and latency
– Memory pressure and allocation rates
Diagnostic commands. The CLI exposes diagnostic tools:
– cortex trace <execution-id> — full execution trace
– cortex cache stats — cache hit rates, sizes, eviction counts
– cortex gc stats — generation sizes, collection history, promotion rates
– cortex process inspect <name> — compiled process details, inheritance chain
– cortex memory inspect <chain-id> — memory chain visualization
4.5 Bus Extensibility
The runtime is extensible because it is built on a messaging bus. Every component in the system communicates through typed messages on the bus. The runtime itself does not call components directly — it publishes events, and components subscribe to the events they care about.
This means the runtime is open for extension without modification:
- Custom step handlers — Register a new action type by subscribing to
step.executeevents whereactionmatches your handler. The runtime doesn’t need to know about your handler — it publishes the event, your handler responds. - Custom gate evaluators — Register a new gate by subscribing to
gate.evaluateevents wheregate_namematches your evaluator. Same pattern. - Custom model providers — Register a new LLM provider by subscribing to
model.invokeevents. The model selection engine routes to your provider based on selection criteria. - Custom observability — Subscribe to
trace.*events to build custom dashboards, export to external systems, or integrate with existing APM tools. - Plugins — The plugin system is built on the bus. A plugin is a bundle of event subscriptions with a manifest. Loading a plugin registers its subscriptions. Unloading a plugin removes them. No code changes to the runtime.
The bus scales from in-process (single agent) to IPC (multi-agent on one machine) to network (distributed agents). The same subscription model works at every scale because the message format is uniform and the delivery mechanism is pluggable.
4.6 The Process IDE
Because processes are compiled with full validation, the compilation pipeline can power developer tooling:
Real-time validation. As a user edits a process YAML file, the compiler runs continuously, surfacing errors and warnings inline — missing knowledge references, unresolved gates, inheritance conflicts, abstract steps that need implementation. This is the process equivalent of a TypeScript language server providing red squiggles as you type.
Autocomplete. The compiler knows the full schema, all registered gates, all registered actions, all knowledge topics in the index. It can provide autocomplete suggestions for every field in a process definition.
Inheritance visualization. For processes that extend other processes, the IDE can show the resolved inheritance chain — which steps are inherited, which are overridden, which knowledge references come from which ancestor. This is the process equivalent of a class hierarchy viewer.
Execution dry-run. The IDE can simulate process execution without invoking the LLM — evaluating gates against current state, resolving knowledge references, computing the viewport allocation, and showing exactly what context would be injected at each step. This lets process authors validate their workflows before committing them.
Diff and history. Process versions are stored with full history. The IDE can show diffs between versions, highlight what changed, and correlate version changes with execution outcome changes from the refinement engine.
The Process IDE is not a separate product — it is a natural consequence of the compiler architecture. Any system that compiles with full validation can power tooling. The CCR’s compiler produces the same kind of structured output (AST, error list, resolved symbols) that a language compiler produces, and the same kinds of tools can be built on top of it.
5. Compiled Context Injection
5.1 The Compilation Pipeline
The CCR compilation pipeline transforms raw knowledge and historical context into compressed, scoped packages injected into the model at each process step.
The pipeline operates in four stages:
-
Retrieval — The process step’s knowledge references are resolved against the local knowledge store. Memory chains and context chains relevant to the current task are retrieved via vector similarity search.
-
Scoping — Retrieved content is filtered to what the current step actually needs. A six-step process does not carry step one’s context through step six unless the process definition explicitly requires it.
-
Compilation — Scoped content is compiled into CTX format — a lossless semantic compression that preserves all meaning while reducing token count. The compilation is structural: redundant framing is removed, cross-references are resolved inline, and hierarchical relationships are encoded in a compact notation.
-
Injection — The compiled CTX package is placed into the model’s context window alongside the step-specific instructions. The model receives a single, coherent, compressed context that contains exactly what it needs.
5.2 The CTX Format
The CTX format is a lossless compression scheme for structured knowledge. It was developed independently for compiling research whitepapers into compact reference formats and has been validated across documents ranging from 5,000 to 30,000 words.
The format achieves 40-60% token reduction on narrative text and 60-84% reduction on structured knowledge (tables, hierarchies, reference material). The compression is lossless in the sense that all semantic content is preserved — a model consuming the CTX version of a document has access to the same information as a model consuming the original, but at a fraction of the token cost.
The format is not a general-purpose compression algorithm. It is specifically designed for LLM consumption: the output is valid text that the model can read directly. No decompression step is required. The model simply reads a more compact representation of the same information.
5.3 Per-Step Scoping
The most significant cost reduction comes not from compression but from scoping. A conventional agent system might inject 50,000 tokens of context into every call — the full conversation history, the full retrieved documents, the full system prompt. The CCR injects only what the current step needs.
Consider a six-step process where each step requires different knowledge:
| Step | Knowledge Needed | Compiled Size |
|---|---|---|
| Read CI log | CI pipeline docs | 1,200 tokens |
| Locate source | Project structure | 2,400 tokens |
| Diagnose | Testing standards | 1,800 tokens |
| Implement fix | Code conventions | 3,200 tokens |
| Run tests | Test commands | 800 tokens |
| Commit | Git workflow | 600 tokens |
Average context per step: 1,667 tokens. Total across six steps: 10,000 tokens. A conventional system would inject the same 50,000-token context six times: 300,000 tokens. The CCR uses 97% fewer input tokens for the same workflow.
6. Memory and Context Chains
6.1 The Memory Problem
The context window is ephemeral. When a session ends, the model’s state is destroyed. Any knowledge accumulated during the session — corrections, preferences, decisions, learned context — is lost unless explicitly persisted somewhere external.
Current approaches to persistence are primitive. Some systems append to a markdown file. Others maintain a flat key-value store. None preserve the structure of how memories relate to each other: which correction superseded which earlier belief, which decision led to which outcome, which preference was refined through which sequence of interactions.
6.2 Memory Chains
A memory chain is a linked sequence of related memory nodes stored in a relational database. Each node contains:
- Content — The memory itself (a decision, preference, correction, observation)
- Type — Classification (correction, decision, preference, observation, outcome)
- Links — Typed edges to other nodes (supersedes, refines, contradicts, led_to, caused_by)
- Embedding — Vector representation for similarity search
- Metadata — Timestamp, source session, confidence, access frequency
Links create structure. When the user corrects the agent, the correction node links to the corrected node with a supersedes edge. When a decision leads to an outcome, the outcome links back with a caused_by edge. When a preference is refined over multiple sessions, each refinement links to the previous with a refines edge.
The result is a directed graph of memories where traversal reveals not just what the agent knows, but how it came to know it — the full epistemic history of every piece of knowledge.
6.3 Context Chains
A context chain links execution contexts causally. Each execution context records a unit of work: what was done, why, what the outcome was, and what it led to.
Context chains answer questions that flat execution logs cannot:
- “Why did we restructure the DNS?” — Walk the chain backward from the DNS context to the domain registration context to the infrastructure discussion.
- “What happened after the PR was merged?” — Walk the chain forward from the merge context to the follow-up tasks.
- “What constraints apply to this task?” — Walk the chain of related contexts to find decisions that established constraints.
6.4 CTX Packages
Memory chains and context chains compile into CTX packages — pre-built, retrievable bundles stored in the database.
A CTX package is compiled from a set of chains, compressed into CTX format, and stored with metadata:
- Source chains — Which memory and context chains were compiled
- Compiled size — Token count of the compiled package
- Raw size — Token count of the uncompiled source material
- Compression ratio — Raw-to-compiled ratio
- Freshness — When the package was last recompiled
- Access pattern — How frequently the package is retrieved (for caching optimization)
Packages can be pre-compiled (for frequently accessed chains), on-demand (compiled at retrieval time), or auto-compiled (the runtime detects frequently co-retrieved chains and pre-compiles them as a package).
6.5 The Viewport Model
The context window is a viewport into the memory system:
The model sees 7,200 tokens of precision-compiled context. Behind that viewport sits a store containing the full history of every session the agent has ever run. The depth is effectively infinite — bounded only by local disk space, not by the context window.
6.6 Implications
The viewport model changes what is possible with a language model:
Perfect recall. The agent can retrieve and compile context from any previous session. A decision made six months ago is as accessible as one made six minutes ago.
No session boundaries. Memory chains span sessions continuously. The distinction between “this session” and “previous sessions” disappears — it is all one continuous memory, scoped through the viewport.
Accumulated expertise. Every correction, preference, and outcome is recorded. The agent’s compiled context for a given task improves over time as more relevant memories accumulate. The agent gets better at your workflow because it remembers everything about your workflow.
Diagnostic capability. When the agent makes a mistake, the memory chain shows why — which memories informed the decision, which were missing, which were stale. This is debuggable, auditable intelligence.
7. Composable Knowledge Packages
7.1 From Personal to Shared
The memory system described in Section 6 is personal by default — one user’s memories, one user’s chains, one user’s machine. But compiled CTX packages are portable artifacts. They can be shared, composed, and distributed.
This transforms the CCR from a personal productivity tool into an organizational knowledge system.
7.2 Package Types
Personal knowledge packages. An individual’s accumulated expertise in a domain — every decision, correction, pattern, and preference compiled into a retrievable bundle. “Everything I know about deploying to Kubernetes” as a CTX package for an engineer. “Everything I know about regulatory filings for Series B” for a startup lawyer. “Everything I know about patient intake workflows” for a clinic administrator. 3,000 tokens containing six months of accumulated context that would otherwise require reading hundreds of threads, documents, and emails.
Team knowledge packages. A team’s shared practices — standards, decisions, patterns, procedures — compiled from the merged memory chains of team members. New team members receive the team’s institutional knowledge as a compiled package. Their agent has the same context as a ten-year veteran on day one. This applies equally to an engineering team’s architecture decisions, a sales team’s qualification criteria, or a research group’s methodology standards.
Organizational knowledge packages. An organization’s tribal knowledge — the undocumented decisions, the unwritten rules, the historical context that explains why things work the way they do. Every organization has decades of accumulated knowledge that exists only in the heads of experienced people. When those people leave, the knowledge leaves with them. Compiled knowledge packages make tribal knowledge persistent, transferable, and precise.
Domain knowledge packages. Expertise in a specific domain — compiled from publications, documentation, best practices, and accumulated execution experience. “How to build event-driven architectures” or “SEC compliance for SaaS companies” or “Clinical trial protocol design” as a CTX package that any user’s agent can consume.
7.3 Composition
Knowledge packages compose. A user’s agent might load:
Active packages:
├── personal/my-preferences (400 tokens)
├── team/backend-standards (1,200 tokens)
├── org/architecture-decisions (2,800 tokens)
├── domain/python-patterns (1,500 tokens)
└── project/payment-service-context (900 tokens)
────────────
6,800 tokens
6,800 tokens carrying the combined expertise of the individual, the team, the organization, and the domain. A new hire’s agent, on their first day, works with the same accumulated context as the most experienced person on the team — because the knowledge is compiled, not remembered.
7.4 Knowledge Models
At the limit, composed knowledge packages form a local knowledge model — a comprehensive, compiled representation of everything an individual or organization knows about their domain.
A knowledge model is not a language model. It does not generate text. It is a structured, indexed, compiled corpus that the language model consumes as context. But it serves a similar function: it encodes expertise. The difference is that it encodes specific expertise — your architecture, your decisions, your patterns, your domain — rather than generic knowledge trained from internet text.
An experienced practitioner’s knowledge model might contain:
- 50,000 memory nodes spanning two years of work
- 1,200 execution contexts recording every task completed
- 300 compiled CTX packages covering every project and domain they’ve touched
- 500,000 vector embeddings indexing their entire knowledge base
Compiled on demand, any subset of this knowledge model can be injected into an LLM call in under 10,000 tokens. The model works as if it has the practitioner’s full expertise — because, through the viewport, it does.
7.5 Codifying Tribal Knowledge
Every organization has tribal knowledge — the accumulated, undocumented understanding that makes the system work. It lives in experienced people’s heads, in hallway conversations, in threads and documents that scroll off-screen. It is the most valuable knowledge the organization possesses and the least persistent.
The CCR codifies tribal knowledge structurally:
-
Capture — As people work with their agents, memory chains accumulate decisions, rationale, corrections, and context. The tribal knowledge that was previously ephemeral is now recorded as linked memory nodes.
-
Compile — Memory chains compile into knowledge packages. “Why the payment service uses eventual consistency” becomes a 600-token CTX package with the full decision chain, not a 5,000-word wiki page nobody reads.
-
Share — Knowledge packages are published to a team or organization knowledge store. Other users’ agents consume them automatically when working in the relevant domain.
-
Evolve — As the system changes, new memory nodes extend the chains. Outdated knowledge is superseded by corrections. The packages recompile automatically. Tribal knowledge stays current because it is maintained by the same system that uses it.
The result: tribal knowledge survives employee turnover. It survives team reorganizations. It survives the passage of time. The knowledge that used to walk out the door when an experienced person left is now compiled, indexed, and available to every agent in the organization — permanently.
7.6 Knowledge Governance
The transition from personal knowledge to organizational knowledge requires governance — a structured pipeline for curating, promoting, evaluating, and distributing knowledge across an organization.
The governance pipeline:
-
Local curation — Knowledge originates with individuals. Their agents accumulate memory chains and compile them into local knowledge packages. The user is the curator — they correct errors, refine context, and shape the knowledge through normal use. This is where knowledge quality is highest, because it is maintained by the person who uses it daily.
-
Promotion — When a user’s local knowledge has organizational value — a decision that affects other teams, a pattern that applies across departments, a procedure that everyone should follow — the user (or their agent) suggests it for promotion. The package becomes a candidate for the organizational knowledge base.
-
Evaluation at the hub — A global knowledge hub receives candidates and evaluates them. This is not blind merging — the hub analyzes the candidate against the existing knowledge base, checks for conflicts with established decisions, validates that the knowledge is generalizable (not specific to one developer’s environment), and assesses quality based on the underlying memory chains. Evaluation can be automated, human-reviewed, or a hybrid where the agent surfaces candidates for human approval.
-
Intelligent merge — Approved candidates are merged into the global knowledge base. “Intelligent” because the merge is not concatenation — it is structural integration. If the candidate extends an existing knowledge chain, it is linked. If it supersedes outdated knowledge, the old nodes are marked as superseded. If it conflicts with existing knowledge, the conflict is surfaced for resolution. The global knowledge base maintains the same chain structure as local packages — it is not a flat wiki, it is a compiled, linked, versioned corpus.
-
Distribution — Updated knowledge is pushed to all agents in the organization through the messaging backplane. The backplane is architecture-agnostic — it can be a local message bus for a small team, Apache Kafka for a large organization, or any pub/sub system in between. Agents subscribe to knowledge topics relevant to their current work. When the global hub publishes an update, subscribing agents receive the new compiled package and integrate it into their local knowledge store. The next time the agent needs that knowledge, it loads the latest version.
Backplane flexibility:
The messaging infrastructure scales with the organization:
| Scale | Backplane | Pattern |
|---|---|---|
| Individual | Local filesystem | Direct read |
| Team (5-20) | Local message bus | Pub/sub, same network |
| Department (20-200) | Managed message queue | Topic-based routing |
| Enterprise (200+) | Kafka / cloud pub/sub | Partitioned, multi-region |
The same knowledge governance pipeline works at every scale because the knowledge format is uniform (compiled CTX packages) and the distribution mechanism is pluggable. An organization starts with a local bus and migrates to Kafka as they grow — the knowledge packages, the governance pipeline, and the agent integration remain unchanged.
The governance loop:
Knowledge governance is not a one-time setup — it is a continuous loop. Local agents curate knowledge through daily use. Valuable knowledge is promoted. The hub evaluates and merges. Updated knowledge distributes to all agents. Those agents use the new knowledge, generating new memory chains, which produce new local packages, which may themselves be promoted. The organization’s knowledge base is a living system that improves with every task every agent executes.
8. The Learning Loop
8.1 Process Discovery
The runtime does not only execute processes — it observes unstructured agent behavior and proposes new process definitions.
When the agent performs a sequence of actions outside of a defined process, the runtime records the sequence. If the same or similar sequence recurs across multiple sessions, the runtime proposes a process definition:
“This sequence has occurred 4 times with consistent steps and positive outcomes. Proposed process:
fix_ci_failure(6 steps, 2 knowledge refs). Approve?”
The proposal includes:
– The proposed YAML definition
– The execution history that inspired it
– Confidence level based on repetition count, consistency of steps, and outcome quality
The user approves, modifies, or rejects. Approved proposals become versioned process definitions. The agent transitions from ad-hoc behavior to deterministic execution for that workflow.
8.2 Process Refinement
After a process has been executed multiple times, the runtime analyzes execution records and surfaces refinement suggestions:
- Missing steps — Actions the agent consistently takes after the process completes, suggesting the process definition is incomplete
- Unnecessary steps — Steps that are consistently skipped or produce no meaningful output
- Missing gates — Steps that frequently fail, suggesting a precondition that should be checked before execution
- Missing knowledge — Topics the model consistently requests mid-execution that weren’t in the knowledge references
- Redundant knowledge — Knowledge references that don’t correlate with improved outcomes
Each suggestion creates a proposed new version of the process. Approved suggestions increment the version. Rejected suggestions are recorded (to avoid re-suggesting).
8.3 Context Optimization
The learning loop extends to context compilation. The runtime tracks which compiled context packages correlate with successful outcomes and which do not. Over time, this produces:
- Leaner packages — Removing knowledge that doesn’t improve outcomes
- Richer packages — Adding knowledge that the model consistently needs but wasn’t declared
- Better scoping — Narrowing or broadening per-step context based on observed usage patterns
The system gets cheaper to run the more you use it. Each execution provides data that the refinement loop uses to reduce waste in subsequent executions.
8.4 The Compound Effect
Process discovery, process refinement, and context optimization compound:
- The agent begins with no processes — all behavior is ad-hoc
- The runtime observes repeated patterns and proposes processes
- Processes replace ad-hoc behavior with deterministic execution
- Deterministic execution produces cleaner execution records
- Cleaner records enable more precise refinement suggestions
- Refined processes use less context and fewer steps
- Less context means fewer tokens per call
- Fewer tokens means lower cost per execution
- Lower cost enables more executions
- More executions produce more data for further refinement
The system converges toward an optimum: maximum workflow reliability at minimum token cost, achieved through continuous, automated, user-approved refinement.
9. Token Economics
9.1 The Cost Structure of Current Systems
LLM inference is priced per token. Input tokens (context) and output tokens (responses) each incur cost. For the purposes of this analysis, input tokens are the dominant cost driver — they are typically 3-10x more numerous than output tokens in agent workflows.
Current agent systems are structurally wasteful:
| Waste Category | Description | Typical Overhead |
|---|---|---|
| Context stuffing | Full conversation history in every call | 5-20x relevant content |
| Redundant retrieval | Same RAG passages injected repeatedly | 2-5x per session |
| No scoping | All knowledge injected regardless of step | 3-8x per step |
| No compression | Raw text, no semantic compression | 1.4-2.5x compressible |
| Exploratory calls | Agent tries approaches, backtracks | 2-4x deterministic path |
These overheads multiply. A task that requires 5,000 tokens of relevant context might consume 200,000-500,000 tokens of input across a session of exploratory, unscoped, uncompressed calls.
9.2 The CCR Cost Structure
The Compiled Context Runtime eliminates each category of waste:
| CCR Innovation | Waste Eliminated | Reduction |
|---|---|---|
| Process definitions | Exploratory calls | 60-75% fewer calls |
| Per-step scoping | Context stuffing + no scoping | 80-95% fewer tokens per call |
| CTX compilation | No compression | 40-84% compression on remaining |
| Memory chains | Redundant retrieval + session loss | Near-zero redundancy |
9.3 Quantitative Analysis
Per-task comparison:
| Metric | Conventional Agent | CCR |
|---|---|---|
| Context per call | ~50,000 tokens | ~7,000 tokens |
| Calls per task | ~20 | ~6 |
| Total input tokens | ~1,000,000 | ~42,000 |
| Reduction | — | 96% |
The 96% figure reflects the compound effect of fewer calls (deterministic processes), smaller context per call (scoped + compiled), and no redundancy (chains eliminate re-retrieval).
Annual cost projections:
| Scale | Conventional Cost/yr | CCR Cost/yr | Annual Savings |
|---|---|---|---|
| Solo practitioner | $2,400 | $100 | $2,300 |
| 10-person team | $24,000 | $1,000 | $23,000 |
| 100-person company | $240,000 | $10,000 | $230,000 |
| 1,000-person enterprise | $2,400,000 | $100,000 | $2,300,000 |
| 50,000-person Fortune 500 | $120,000,000 | $5,000,000 | $115,000,000 |
Global projection:
LLM-assisted workflows extend far beyond software development. Analysts, researchers, writers, legal professionals, designers, consultants, educators, and administrators all use LLMs for knowledge work. The total addressable population is hundreds of millions of knowledge workers worldwide.
With conservative assumptions about adoption:
- 500 million knowledge workers globally (developers, analysts, researchers, writers, legal, consulting, education, etc.)
- 5% adoption rate: 25 million users
- Average savings of $2,300/year per user (solo-tier conservative)
- $57.5 billion in annual savings globally
At enterprise adoption rates with enterprise pricing, the figure is significantly higher. These are structural savings — they arise from architectural decisions, not from negotiating better API rates.
9.4 Beyond Cost: Reliability
Token reduction is not only an economic benefit. It directly improves model reliability.
A model processing 7,000 tokens of precision-compiled context attends more effectively than a model processing 50,000 tokens of raw, unscoped text. Attention dilution — the degradation of model performance as context grows — is a well-documented phenomenon. By reducing context to only what is relevant, the CCR improves not just cost but accuracy, consistency, and instruction-following.
The cheapest call is also the most reliable call. This is not a tradeoff — it is a structural advantage.
9.5 Beyond Cost: Energy and Environmental Impact
Token economics are not only a financial concern. Every token processed by a large language model requires GPU computation, which consumes electricity, which generates carbon emissions.
The energy cost of LLM inference is substantial and growing. A single GPU running inference consumes 300-700 watts. Data centers operating thousands of GPUs for inference consume megawatts continuously. As LLM-assisted work scales to hundreds of millions of knowledge workers making hundreds of calls per day, the aggregate energy consumption becomes a material environmental concern.
The CCR’s 96% reduction in input tokens translates directly to reduced computation:
-
Fewer tokens per call — Less GPU time per inference. A 7,000-token input processes faster and consumes less energy than a 50,000-token input. The relationship is not linear — attention mechanisms scale quadratically with sequence length — so the energy savings from shorter contexts are superlinear.
-
Fewer calls per task — Deterministic processes eliminate exploratory back-and-forth. Six calls instead of twenty means one-third the GPU invocations.
-
Compound reduction — Fewer calls, each processing fewer tokens, each requiring less computation per token (due to quadratic attention scaling). The energy reduction compounds beyond the token reduction.
Projected energy savings at scale:
| Scale | Conventional GPU-hours/yr | CCR GPU-hours/yr | Energy Saved |
|---|---|---|---|
| 1,000-person enterprise | ~175,000 | ~7,000 | 168,000 GPU-hours |
| Fortune 500 (50K users) | ~8,750,000 | ~350,000 | 8,400,000 GPU-hours |
| Global (25M users at 5%) | ~4,375,000,000 | ~175,000,000 | 4,200,000,000 GPU-hours |
At approximately 500 watts per GPU, 4.2 billion GPU-hours represents 2,100 gigawatt-hours of electricity saved annually — equivalent to powering roughly 190,000 American homes for a year.
The environmental case reinforces the economic case. Organizations adopting the CCR model reduce both their LLM spending and their computational carbon footprint. At global scale, the aggregate reduction in unnecessary GPU computation is measured in hundreds of gigawatt-hours — a meaningful contribution to sustainable AI infrastructure.
The impact extends beyond electricity. Large-scale GPU inference drives demand across the full data center supply chain:
-
Cooling — GPUs generate heat proportional to computation. Data centers consume massive quantities of water and energy for cooling. Microsoft reported consuming 1.7 billion gallons of water in 2022, with AI workloads as a significant driver. Reducing unnecessary computation reduces cooling demand proportionally.
-
Hardware — GPU manufacturing requires rare earth minerals, complex fabrication, and significant embodied carbon. Every unnecessary GPU deployed to handle wasteful inference is hardware that didn’t need to be manufactured. Reducing demand for inference capacity reduces demand for GPU production.
-
Land and construction — Data centers require physical space, power infrastructure, and network connectivity. The global data center construction boom is driven substantially by AI inference demand. Reducing that demand eases pressure on land, power grids, and construction resources.
-
Network — Every API call transmits tokens across network infrastructure. Reducing token volume reduces network load, which reduces energy consumption at every hop between the user’s machine and the inference cluster.
The CCR does not merely optimize a financial cost. It reduces the physical resource footprint of AI-assisted development at every layer of the infrastructure stack. The most sustainable token is the one that was never sent.
The most efficient inference call is the one that processes only what matters. The CCR ensures that every token that reaches the GPU earns its energy cost.
10. Architectural Integration
10.1 Relationship to Harmonic Design
The Compiled Context Runtime is designed using Harmonic Design (HD) principles. The process engine, compilation pipeline, and memory system decompose into the standard HD tiers:
VBD — Backend Decomposition:
| Component | Tier | Responsibility |
|---|---|---|
| ProcessManager | Manager | Matches triggers to processes, orchestrates execution |
| ProcessExecutionEngine | Engine | Runs steps, manages gates, records outcomes |
| ProcessDiscoveryEngine | Engine | Detects patterns in execution history, proposes processes |
| ProcessRefinementEngine | Engine | Analyzes outcomes, proposes improvements |
| CompilationEngine | Engine | CTX compilation pipeline |
| MemoryChainEngine | Engine | Chain traversal, linking, package compilation |
| ProcessDefinitionAccessor | Accessor | CRUD on process definitions (SQLite) |
| ExecutionRecordAccessor | Accessor | Read/write execution records (SQLite) |
| MemoryAccessor | Accessor | Read/write memory nodes and edges (SQLite) |
| KnowledgeStoreAccessor | Accessor | Vector similarity search, embedding management |
EBD — Interface Decomposition:
| Component | Layer | Responsibility |
|---|---|---|
| ProcessManagementExperience | Experience | Define, browse, and manage processes |
| ProcessExecutionFlow | Flow | Step-through execution with progress |
| ProcessSuggestionFlow | Flow | Review and approve suggestions |
| MemoryExplorerExperience | Experience | Browse and search memory chains |
| ChainDetailInteraction | Interaction | Inspect individual chain nodes and links |
BDT — Test Spiral:
| Scope | Coverage |
|---|---|
| Unit | Engines: step execution, gate evaluation, pattern detection, CTX compilation, chain traversal |
| Integration | Accessors with mocked SQLite/vector DB; YAML parsing; compilation pipeline |
| E2E | Full trigger → match → gate → compile → inject → execute → record |
10.2 Data Layer
All persistent state resides in two local stores:
SQLite — Process definitions, execution records, memory nodes, memory edges, context chain records, CTX package metadata, gate results, step outcomes.
Vector database — Knowledge embeddings, memory node embeddings, process description embeddings, execution summary embeddings. Used for similarity search during retrieval and for natural language queries (“find the process that handles CI failures”).
Both stores are local files. No network dependency. No external service. Backup is a file copy.
11. Validation and Falsifiability
11.1 Testable Claims
The CCR model makes specific, falsifiable claims:
-
Token reduction: Compiled, scoped context injection reduces input tokens per task by at least 80% compared to conventional context stuffing. Measurable by comparing total input tokens for identical tasks.
-
Call reduction: Deterministic process execution reduces the number of LLM calls per task by at least 50% compared to ad-hoc agent behavior. Measurable by counting calls for identical tasks.
-
Outcome quality: Models receiving precision-compiled context produce equal or better outcomes compared to models receiving raw, unscoped context. Measurable by blind evaluation of outputs.
-
Memory accuracy: Memory chains with typed links produce more accurate context retrieval than flat memory stores. Measurable by comparing retrieval precision and recall.
-
Convergence: The learning loop (discovery + refinement + context optimization) produces measurable improvements in token efficiency over time. Measurable by tracking tokens-per-task across process versions.
11.2 What Would Disprove the Model
The CCR model would be disproved if:
- Compiled context produces materially worse model outputs than raw context (compression is lossy in practice, not just in theory)
- Process definitions are too rigid to handle the variance of real-world tasks (deterministic steps cannot accommodate necessary creativity)
- The learning loop converges to local minima that are worse than ad-hoc behavior
- The overhead of compilation, retrieval, and chain management exceeds the savings from reduced tokens
These are empirical questions answerable through implementation and measurement.
12. Conclusion
The Compiled Context Runtime is not an optimization applied to existing agent architecture. It is a different architecture. It replaces context stuffing with compiled injection, replaces prompt-dependent behavior with process-driven execution, and replaces session-bounded memory with persistent, linked, compilable chains.
The model’s context window stops being a limitation and becomes an instrument. The agent stops forgetting and starts accumulating expertise. The cost of each execution drops as the system learns what context matters and what does not.
The system is local-first because the data it manages — workflows, memories, execution history, knowledge — is too valuable and too sensitive to externalize. It is open source because the structural advantages it provides should be accessible to everyone, not gated behind a platform subscription.
The economic impact is measured in tens of billions because the waste it eliminates is structural — embedded in how every current agent system is built. The Compiled Context Runtime does not ask users to write better prompts. It makes the prompt irrelevant as a vehicle for workflow definition, and makes the context window irrelevant as a constraint on memory depth.
What remains is the model doing what it does best — reasoning, creating, solving — with exactly the context it needs, compiled from everything the system has ever learned.
Appendix A: Glossary
Attention Dilution — Degraded model performance caused by irrelevant tokens competing for attention in an oversized context window.
Build Primitive — The fourth execution primitive: creating new processes, knowledge artifacts, or tools when Learn identifies gaps.
Compiled Context — A precision-scoped, losslessly compressed package of knowledge and state injected into the model’s context window for a specific process step.
Compiled Context Runtime (CCR) — An architectural model for agent execution that replaces context stuffing with compiled injection, prompt-dependent behavior with process-driven execution, and session-bounded memory with persistent chains.
Context Chain — A linked sequence of context records capturing the full history of a task’s execution, compilable into a CTX package on demand.
Context Stuffing — The conventional approach of packing raw text into the context window before each inference call. The primary source of waste that CCR eliminates.
CTX Format — The lossless compression format used for compiled context packages, optimizing for token efficiency while preserving semantic completeness.
Execution Cycle — The five-primitive loop governing all agent work: Orchestrate → Execute → Learn → Build → Refine.
Execute Primitive — The second execution primitive: performing the actual work that produces external output.
Gate — A precondition declared in a process definition that must be satisfied before a step can proceed.
Knowledge Governance — The pipeline for curating, promoting, and distributing knowledge across organizational boundaries: local → team → organizational → hub.
Knowledge Package — A composable unit of domain knowledge with explicit scope, dependencies, and compilation rules.
Learn Primitive — The third execution primitive: analyzing outcomes at meta-learning (process improvement) and context-learning (domain knowledge) levels.
Local-First — The design principle that all agent data resides on the user’s machine, with no workflow data crossing network boundaries except compiled context sent to the LLM.
Memory Chain — A persistent, linked sequence of memory records that accumulates across sessions, giving the model access to unbounded historical depth.
Model-Agnostic — The design property where intelligence accumulates in the data layer rather than model weights, making inference endpoints interchangeable.
Orchestrate Primitive — The first execution primitive: loading state, reading knowledge, compiling context, analyzing dependencies, and dispatching work.
Process Definition — A versioned, executable YAML specification of an agent workflow, declaring steps, gates, knowledge requirements, and trigger conditions.
Process Discovery — The system that detects repeated ad-hoc sequences and proposes new process definitions to codify them.
Refine Primitive — The fifth execution primitive: improving existing processes, knowledge, and tools based on execution analysis.
Token Economics — The quantitative analysis of cost reduction achieved by compiled context injection versus context stuffing, measured at individual, enterprise, and global scale.
Viewport — The conceptual model of the context window as a precision-scoped lens into a potentially unlimited local data store, rather than a hard size limit.
References
William Christopher Anderson
Anderson, W. C. Volatility-Based Decomposition in Software Architecture: A Practitioner-Oriented Articulation. Unpublished manuscript, 2026.
VBD provides the backend decomposition framework — Manager, Engine, Accessor, Utility tiers — that the CCR’s process engine, compilation pipeline, and memory system are structured around. The volatility-driven tier assignments and communication rules described in this paper directly govern the CCR’s component architecture.
Anderson, W. C. Experience-Based Decomposition: A Practitioner-Oriented Articulation. Unpublished manuscript, 2026.
EBD provides the interface decomposition framework — Experience, Flow, Interaction layers — that governs how users interact with the CCR through CLI, MCP tools, and future interfaces. The separation of orchestration from interaction mirrors the CCR’s own separation of process management from step execution.
Anderson, W. C. Boundary-Driven Testing: A Practitioner-Oriented Articulation. Unpublished manuscript, 2026.
BDT provides the test architecture — unit, integration, and end-to-end spirals mirroring component tiers — that validates the CCR’s boundaries. The structural isomorphism between component tiers and test scopes ensures that each boundary in the system has a corresponding test boundary.
Anderson, W. C. Harmonic Design: A Unified Software Engineering Practice. Unpublished manuscript, 2026.
Harmonic Design unifies VBD, EBD, and BDT as harmonics of the same fundamental principle: organize by anticipated change. The CCR is built as an HD system — its backend decomposes by VBD, its interfaces by EBD, its tests by BDT, and the three frameworks reinforce each other structurally. The CCR’s own knowledge governance, process definitions, and compilation pipeline are all governed by HD principles.
David Lorge Parnas
Parnas, David L. “On the Criteria to Be Used in Decomposing Systems into Modules.” Communications of the ACM, vol. 15, no. 12, 1972, pp. 1053–1058.
Parnas’s foundational insight — that systems should be decomposed by what is likely to change, not by workflow or data flow — is the intellectual ancestor of VBD and, by extension, the CCR’s own decomposition. The CCR’s separation of process definitions (highly volatile) from the compilation pipeline (moderately volatile) from the storage layer (stable) directly reflects Parnas’s criteria.
Juval Lowy
Lowy, Juval. Righting Software. Addison-Wesley, 2019.
Lowy’s IDesign methodology originated the volatility-based decomposition approach, the Manager/Engine/Accessor/Utility taxonomy, and the communication rules that VBD articulates. The CCR’s architectural structure — managers orchestrating engines that encapsulate logic over accessors that isolate external resources — is a direct application of Lowy’s system.
Martin Fowler
Fowler, Martin. Patterns of Enterprise Application Architecture. Addison-Wesley, 2002.
Fowler’s patterns for layered architecture, repository abstraction, and unit of work inform the CCR’s accessor patterns and state management. The SynapseAccessor and VectorAccessor patterns in the CCR follow Fowler’s repository pattern adapted for filesystem and vector database access.
Eric Evans
Evans, Eric. Domain-Driven Design: Tackling Complexity in the Heart of Software. Addison-Wesley, 2003.
Evans’s bounded contexts inform the CCR’s knowledge package boundaries. Each knowledge package — personal, team, organizational, domain — functions as a bounded context with explicit interfaces for composition. The CCR’s knowledge governance pipeline reflects DDD’s strategic design principles applied to knowledge management rather than code.
Ashish Vaswani et al.
Vaswani, Ashish, et al. “Attention Is All You Need.” Advances in Neural Information Processing Systems, 2017.
The transformer architecture’s quadratic attention scaling with sequence length is the fundamental constraint that makes context compilation economically valuable. The CCR’s token economics — superlinear energy savings from shorter contexts — derive directly from the attention mechanism’s computational characteristics.
Nelson F. Liu et al.
Liu, Nelson F., et al. “Lost in the Middle: How Language Models Use Long Contexts.” Transactions of the Association for Computational Linguistics, 2024.
Liu et al.’s demonstration that language models attend poorly to information in the middle of long contexts provides empirical support for the CCR’s compilation approach. By delivering only relevant, precision-scoped context rather than large volumes of raw text, the CCR avoids the “lost in the middle” phenomenon entirely.
Author’s Note
The Compiled Context Runtime synthesizes ideas from multiple domains: process engineering, knowledge management, compiler design, and agent architecture. The architectural framework — Harmonic Design and its constituent practices — originates from the author’s prior work articulating VBD, EBD, BDT, and HD. The specific application of these frameworks to agent runtime architecture, compiled context injection, memory chains, composable knowledge packages, knowledge governance, and dynamic model selection is, to the author’s knowledge, novel.
The system described in this paper is not theoretical. The author has built and operates a working implementation of the core concepts: process definitions in YAML governing agent execution, a knowledge index with compiled context injection per step, memory that persists across sessions and accumulates over months, execution contexts that track every task from trigger to completion, and a knowledge governance pipeline that curates and distributes knowledge across agent sessions. The token economics are derived from measured reductions in actual agent workflows, not projections from hypothetical systems.
The decision to scope the CCR to all knowledge workers — not just software developers — reflects the observation that every LLM-assisted workflow, regardless of domain, suffers from the same structural waste: bloated context, stateless execution, no learning between sessions, and no process discipline. A lawyer reviewing contracts, a researcher analyzing papers, an analyst building financial models, and a developer writing code all benefit equally from compiled context, deterministic processes, and accumulated memory. The architecture is domain-agnostic because the problem it solves is domain-agnostic.
The model-agnostic design — where the runtime dynamically selects the optimal model per step based on task requirements and available capabilities — is a deliberate architectural choice, not a compatibility feature. Intelligence should accumulate in the data layer (processes, memories, knowledge), not in any particular model’s weights. When the data layer carries the intelligence, models become interchangeable inference endpoints, and organizations are freed from vendor lock-in. The knowledge you build today works with whatever model exists tomorrow.
The knowledge governance pipeline — local curation, organizational promotion, hub evaluation, intelligent merge, and backplane distribution — addresses what the author considers the most valuable application of the CCR: codifying tribal knowledge. Every organization loses critical knowledge when experienced people leave. The CCR makes that knowledge persistent, compilable, and distributable. At organizational scale, this is not a productivity optimization — it is a structural solution to institutional knowledge loss.
Distribution Note
This document is provided for informational and educational purposes. It may be shared internally within organizations, used as a reference in architectural and design discussions, or adapted for non-commercial educational use with appropriate attribution. All examples are generalized and abstracted to avoid disclosure of proprietary or sensitive information.
Copyright (c) 2026 William Christopher Anderson. All rights reserved.