Decisions - Intent

Decisions are the junction points where intent becomes action. A decision records: what we chose to do, why we chose it, what happened as a result. Decisions build institutional memory and inform future specs.

Decision Structure

A decision record includes:

Status: Proposed, Decided, Implemented, Observed, Archived
Context: What problem or opportunity triggered this decision?
Options: What were the alternatives?
Decision: What did we choose and why?
Consequences: What are the expected outcomes?
Implementation: Link to the spec that executed this decision
Observation: What actually happened?

Decision Lifecycle

Decisions flow through the notice→spec→execute→observe loop:

Notice: We spot a problem or opportunity
Propose: We propose a solution (decision record created)
Decide: We commit to the decision
Spec: We write a spec to execute the decision
Execute: The spec gets built
Observe: We watch what happens and record outcomes
Archive: The decision and its outcomes become reference material

Why Decision Records Matter

Accountability: Every decision is recorded with reasoning. No "I don't remember why we did this."
Learning: Decisions with outcomes become reference material. You see what worked and what didn't.
Consistency: Similar future decisions can reference previous ones. This reduces re-litigating old arguments.
Onboarding: New team members see the decision history and understand how the system evolved.
Reversibility: Old decisions are archived, not deleted. You can revisit them if context changes.

Architectural Decisions (D1–D19)

The following twelve decisions form the architectural spine of Intent. They were made by the Architecture Review Board (ARB) in the design phase and guide all development. Each decision record links back to the ARB for the full governance context.

D1: Markdown for Specifications

Accepted

Context

Intent needs a format for specs that is both human-readable and machine-parseable. The format must integrate with version control, IDE tooling, and the terminal-first workflow.

Decision

Use Markdown with YAML frontmatter for all specifications. Frontmatter contains metadata (title, ID, phase, priority, owner); body contains spec narrative, requirements, and acceptance criteria.

Rationale

Markdown is the lingua franca of developer tools. It integrates natively with Git diffs, most IDEs ship with preview, syntax highlighting, and linting. YAML frontmatter is battle-tested in static site generators and content management. This combination requires zero custom tooling while supporting both human review and programmatic parsing.

Consequences

Specs are Git-diffable and auditable — every change is tracked
IDE-native — no custom editors required, works in VS Code, Vim, etc.
Natural fit with the terminal workflow — edit in your shell-native editor
YAML parsing is trivial in any language — agents can extract metadata without a spec parser
Limits complex nested structures (but forces simpler specs)

Alternatives Considered

JSON-only specs (lost human readability, spec diffs are noise)
Custom format (requires custom tooling, raises adoption friction)
Executable specs (Python, Go) (couples spec to runtime, loses version-control readability)

D2: Git as Source of Truth

Accepted

Context

Intent tracks four kinds of immutable events: notices, specs, executions, and observations. Each event must be versioned, auditable, and mergeable. The system needs to support offline-first operation and decentralized collaboration.

Decision

Git is the single source of truth. All signals, specs, and events are stored as JSONL files in a Git repository. No external database stores the primary event log. The database (Phase 2+) is a materialized view for query performance only.

Rationale

Git solves version control, audit trails, branching, merging, and conflict resolution — proven for a decade in software engineering. It enables offline-first operation (critical for tools that depend on always-on network). Git repositories can be cloned, forked, merged, and rebased. Every contributor can audit the full history.

Consequences

Offline-first by design — Claude Code agents can work on any machine with a Git checkout
Mergeable event streams — multiple agents can capture signals in parallel, then reconcile
Auditable — every event has a commit SHA, timestamp, and author
Scales to team size (Git supports teams of thousands) with eventual consistency
Real-time query performance (Phase 1) is limited to in-memory or file-scan access (solved by Phase 2 materialized database)
Requires discipline in commit messages and schema versioning

Alternatives Considered

Postgres/SQLite from day one (loses offline-first, adds infrastructure burden for solo evaluators)
Kafka event stream (requires always-on infrastructure, not suitable for offline CLI tools)
Cloud-hosted event store (vendor lock-in, rules out disconnected operation)

D3: Claude API for Agent Runtime

Accepted

Context

Intent needs a reliable, accessible LLM runtime for executing specs. Agents must handle different task complexities (simple routing, complex reasoning, code generation). The runtime must be accessible to solo practitioners and small teams.

Decision

Use Claude API (via Anthropic SDK) for all LLM-powered execution. Model routing by task complexity: Haiku for simple classification/routing, Sonnet for standard spec execution, Opus for complex reasoning and multi-step problem-solving.

Rationale

Claude is production-grade, available via API, supports system prompts and structured output, and has favorable token pricing for agents that operate at scale. Routing by complexity optimizes cost-per-execution. The SDK is well-documented and integrates cleanly with Python and FastMCP.

Consequences

Vendor dependency — Intent's operation depends on Anthropic API availability and pricing
Cost is proportional to token usage — agents that speculate excessively become expensive
Model routing requires clear decision boundaries in specs (simple vs complex) — agents must declare their complexity budget upfront
Supports the entire spectrum from solo practitioner (pay-as-you-go) to team (volume pricing)

Alternatives Considered

Open-source models (LLaMA, Mistral) self-hosted (requires infrastructure, slower iteration, no long-context support at release time)
Competing commercial APIs (OpenAI, Google) (Anthropic chosen for long-context, cost efficiency, and specialized reasoning)
Hybrid (multi-model) (adds complexity for marginal benefit at Phase 1)

D4: SQL (SQLite → PostgreSQL) for Persistence

Accepted

Context

Phase 1 operates Git-only (JSONL). Phase 2 introduces a team/server deployment where agents and humans need to query signals and specs by state, time, owner, trust score, etc. Queries must be fast and composable.

Decision

For Phase 1, query via in-memory dataframes (pandas) or file scan. For Phase 2+, materialize a SQL schema in SQLite (local) or PostgreSQL (team). Schema mirrors the signal, spec, and event structures. SQL is the query language for all dashboards and analytics.

Rationale

SQL is the standard for structured queries. Schema migration is well-understood. Both SQLite (for teams of 1–5) and PostgreSQL (for teams of 5+) are battle-tested. This decision defers infrastructure complexity to Phase 2, letting Phase 1 stay simple and Git-only.

Consequences

Phase 2 requires a schema migration (transform JSONL to relational tables) — one-time, one-way operation
SQL schemas are explicit and versioned — schema changes are specs themselves
Enables fast, complex queries (e.g., "signals by owner, grouped by trust tier, with spec count")
Requires ACID discipline — transaction boundaries must be well-defined in specs
Limits real-time consistency (materialized view is eventually consistent with Git)

Alternatives Considered

NoSQL (MongoDB, DynamoDB) (loses ACID guarantees, schema-less makes validation harder)
Stick with Git forever (query performance degrades as event log grows)
Kafka/Streaming (requires always-on infrastructure, not suitable for Phase 1 solo practitioners)

D5: Static Site Dashboard (GitHub Pages + Vanilla HTML/JS)

Accepted

Context

The site (pitch.html, work-system.html, etc.) is the primary communication surface. The Observe dashboard (work-system.html's signal stream, dogfood.html's live events) must work without a server. Zero-ops is a requirement.

Decision

Use GitHub Pages for hosting. Build the dashboard in vanilla HTML, CSS, and ES2020+ JavaScript. Data is sourced from JSON files committed to the repo (auto-generated from events.jsonl). No build step, no server, no Node.js.

Rationale

GitHub Pages is free, fast (CDN-backed), and requires zero ops. Vanilla JS (no framework overhead) keeps bundle size small (~50KB gzipped). Committing JSON snapshots decouples dashboard freshness from request latency. This approach supports evaluators downloading the repo and running the dashboard locally without npm install.

Consequences

Dashboard updates are batch jobs (snapshot frequency) — not real-time
No dynamic queries — the dashboard is pre-rendered with all needed data
Limited interactivity (no server-side filtering, but client-side JS can handle moderate complexity)
Low operational burden — a single GitHub Pages push deploys the entire site and dashboard
Snapshot frequency is tunable (every commit, hourly, daily) based on observability needs

Alternatives Considered

Next.js/SPA framework (adds build step, npm dependencies, deployment infrastructure)
Grafana/Datadog dashboards (requires cloud account, vendor lock-in, not self-hostable)
Server-rendered (Python Flask) (requires always-on infrastructure, adds compliance/security scope)

D6: Deferred Features (Kubernetes, Kafka, GraphQL, Custom ML, Blockchain)

Accepted

Context

During design, several advanced features were proposed: Kubernetes orchestration, Kafka event streaming, GraphQL federation, custom ML models for trust scoring, and blockchain audit logs. Each adds power but also scope and complexity.

Decision

Hold all six features until Phase 3. Focus Phase 1 on core loop (notice→spec→execute→observe). Focus Phase 2 on team scaling (SQL database, multi-user). Revisit at Phase 3 after learning from Phase 1–2 production use.

Rationale

Scope discipline accelerates early learning. The core loop works without these features. Adding them now risks shipping nothing. Deferred features will be re-evaluated once we have real execution data to guide prioritization.

Consequences

Simpler architecture for Phase 1 and 2 — shorter time to learning
Faster iteration on core concepts — less infrastructure to debug
Clearer Phase 3 roadmap — informed by real usage patterns, not speculation
May revisit decisions if market demands otherwise

Alternatives Considered

Build all six (delays MVP by 6–12 months)
Build three immediately (still overscopes Phase 1)

D7: Two Products, Not One

Accepted

Context

Intent started as a single system. As the knowledge base grew more capable, it became clear that the compiled knowledge base and the notice→spec→execute→observe loop are independently valuable. A team might want a compiled knowledge base without adopting the full Intent loop. Conflating them creates unnecessary coupling.

Decision

Intent (methodology) and Knowledge Engine (product) are formally separated. Knowledge Engine is separable — it can be used without the Intent loop. Brien's Knowledge Farm (Subaru, ASA, F&G engagement data) is an instance of the Knowledge Engine, not part of Intent itself. The domain overlap between Brien's consulting practice and Intent's methodology is coincidental, not structural.

Rationale

Separation enables independent adoption paths. A team can adopt the Knowledge Engine for compiled domain understanding without committing to the full Intent loop. It also prevents the Knowledge Engine from inheriting Intent-specific assumptions (like the four-phase loop) that don't apply to all knowledge compilation use cases.

Consequences

Two product roadmaps. Two adoption stories. Knowledge Engine gets its own AGENTS.md, CLI, and MCP server. Intent's three-layer architecture makes this separation explicit — Layer 1 (Knowledge Engine) is independent of Layer 2 (Intent loop).

Alternatives Considered

Keep as one product (creates coupling, limits KE adoption). Extract KE to separate repo (premature — shared patterns still evolving).

D8: Engagement Rollout Order

Accepted

Context

The Knowledge Engine needs to be validated against real engagement data. Five engagements are candidates: Subaru, F&G, ASA, Cargill, Footlocker. Each has different data volumes, complexity, and learning potential. The rollout order determines where we learn fastest.

Decision

Subaru → F&G → ASA → Cargill → Footlocker. Subaru first because it has the most data and the highest learning potential. Each engagement validates federation, redaction, and compilation patterns before the next begins.

Rationale

Subaru's data volume stress-tests the compilation pipeline. F&G tests a different domain (insurance vs automotive manufacturing). ASA tests healthcare. Cargill and Footlocker extend to supply chain and retail. This progression maximizes domain diversity per engagement.

Consequences

Subaru becomes the reference implementation. Patterns discovered there become Core templates. Later engagements inherit proven patterns rather than discovering them independently.

Alternatives Considered

Alphabetical (no learning optimization). Smallest first (less learning, lower risk but slower maturation). Parallel (too much surface area for one practitioner).

D9: Knowledge Engine as New MCP Server

Accepted

Context

Knowledge operations (ingest, query, lint) need a server interface for agent access. Two options: bolt onto the existing intent-notice server (since Notice consumes knowledge output) or create a new server.

Decision

New `intent-knowledge` server on port 8004 with its own CLI `intent-knowledge` (subcommands: ingest, query, lint). Not bolted onto intent-notice.

Rationale

Knowledge operations are a different concern than signal capture. Notice consumes knowledge output but doesn't own knowledge lifecycle. Separate servers maintain single-responsibility and enable the Knowledge Engine to be deployed independently (supporting Decision D7). A team using only the Knowledge Engine deploys only intent-knowledge — no need for notice, spec, or observe servers.

Consequences

Four servers instead of three. Deployment configs need a fourth entry. The architecture diagram gains a layer. But each server has a clear, narrow responsibility.

Alternatives Considered

Bolt onto intent-notice (violates SRP, couples KE to Intent loop). Bolt onto all three (even worse coupling). HTTP API instead of MCP (loses agent-native tooling).

D10: Retroactive Enrichment = Suggested

Accepted

Context

When a new raw source is ingested, it may invalidate or enrich existing knowledge artifacts. Should the system automatically cascade updates, or surface opportunities as signals?

Decision

Suggested-first, on-demand second. Lint detects recompilation opportunities and surfaces them as signals. Execution happens on demand, not automatically. No automatic cascades.

Rationale

Automatic cascades are dangerous — a single bad ingest could trigger recompilation across hundreds of artifacts. Suggested-first preserves human oversight for the most impactful updates while keeping the system responsive. The disambiguation signal pattern (already proven in the Notice phase) applies naturally here.

Consequences

Some knowledge artifacts may be temporarily stale after a new ingest. The lint cycle catches this and surfaces it. The trade-off is latency for safety — appropriate for a system handling client-confidential data.

Alternatives Considered

Automatic cascades (risk of cascading errors). Manual review only (too slow, Brien becomes the bottleneck). Hybrid with blast-radius limits (added complexity for marginal benefit at current scale).

D11: Redaction at Tool Level

Accepted

Context

The Knowledge Engine handles client-confidential data across multiple engagements. When querying across engagements or promoting from engagement to Core, confidential information must be redacted. Who is responsible for redaction?

Decision

The MCP server applies confidentiality projection automatically based on engagement context. Not a flag Brien has to remember. Every query through intent-knowledge gets filtered by the caller's engagement scope.

Rationale

Redaction as a manual step is a liability — one forgotten flag leaks client data across engagement boundaries. Tool-level enforcement means the system is secure by default. The federation model's "never leak sideways" rule is enforced at the tool boundary, not the human boundary.

Consequences

Slightly more complex server implementation. But elimination of a class of confidentiality errors. Queries that cross engagement boundaries return only Core-level knowledge plus the caller's own engagement data.

Alternatives Considered

Manual redaction flag (error-prone). Post-processing filter (late, data already exposed in context). Separate servers per engagement (operational overhead, fragmented knowledge).

D12: Spec-Shaping Through Personas

Accepted

Context

The gap between "intent proposed" and "spec ready for agent execution" is where most specs fail. Intents are captured as Problem/Outcome/Evidence/Constraints — enough for a human, not enough for an agent. Who bridges this gap?

Decision

Intents become specs through four-persona interrogation: △ Shape (Architect), ◇ Outcome (Product Leader), ○ Contract (Quality Advocate), ◉ Readiness (Agent). Each persona queries the knowledge base, checks existing decisions, and generates structured assertions. The system self-prompts with each persona. Brien reviews specs, not execution.

Rationale

Most people use 2 of 10 dimensions when prompting (Huryn's observation). The four-persona protocol ensures all dimensions are covered: role, task, goal, audience, context, style, structure, constraints, output format, and contract. Self-prompting with domain-specific personas produces richer specs than manual authoring because each persona has access to the compiled knowledge base.

Consequences

Spec quality depends on knowledge base richness + protocol rigor, not Brien's energy. Brien's review point shifts upstream — from reviewing execution output to reviewing spec output (higher leverage). If the readiness assessment (Pass 4) scores below L2, a disambiguation signal is generated instead of executing.

Alternatives Considered

Brien writes all specs manually (doesn't scale, quality varies with energy). Single-pass spec generation (misses dimensions). Template-only approach (doesn't query knowledge base, produces shallow specs).

D13: Staged GTM — Thought Leadership First

Accepted

Context

Intent is both a methodology and a potential software product. Go-to-market strategy needs to sequence these correctly to build credibility before asking for money.

Decision

Three-stage GTM: (1) Thought leadership — publish the methodology, build credibility with practitioner-architects. (2) Methodology product — sell consulting/coaching using Intent as the operating model. (3) Tooling — build software only if validated by stages 1-2. Each stage gates the next.

Rationale

Building tools before validating the methodology is the classic "solution looking for a problem" failure mode. Thought leadership is low-cost, high-signal. If the methodology doesn't resonate with senior practitioners, the tooling won't either.

Consequences

Slower time-to-revenue but higher confidence in product-market fit. Tooling investment is conditional, not assumed. The site exists as the Stage 1 artifact.

D14: Four-Product Framing

Accepted

Context

Intent's loop has four phases (Notice, Spec, Execute, Observe). Should these be one monolithic product or distinct products with their own roadmaps?

Decision

Each phase is a distinct product with its own maturity assessment, roadmap, and adoption path. Notice is operational, Spec is tooled, Execute is defined, Observe is schema-ready. Teams can adopt one phase without buying the whole system.

Rationale

Monolithic product adoption requires organizational buy-in. Phase-level adoption lets individual teams start with signal capture (Notice) and expand as they see value. This matches how methodologies actually spread — bottom-up, not top-down.

Consequences

Each product needs its own MCP server, CLI tools, and documentation. Complexity increases but adoption barrier decreases. The loop still works as a whole — the products are composable, not independent.

D15: Three-Layer Architecture

Accepted

Context

Intent started as just a loop (Layer 2). But the compiled knowledge base and the running code are structurally distinct. Karpathy's "code as source, LLM as compiler, executable as output" pattern matches what we observe. How should the architecture formalize this?

Decision

Three layers, bidirectionally coupled: (1) Compiled Knowledge Base — everything the system knows (raw/, knowledge/). (2) Transformation OS — the Notice→Spec→Execute→Observe engine (.intent/, spec/). (3) Software Spec & Code — specs, contracts, and running code. Six data flows connect them.

Rationale

Independent layers can evolve at different speeds. The knowledge base compiles once and keeps current (not RAG). The loop transforms knowledge into specs. The code executes specs. Without this separation, knowledge and execution are tangled — making it impossible to reason about either.

Consequences

Knowledge Engine becomes a separable product (see D8). Double-loop learning becomes possible — observations can update domain understanding (Layer 1), not just execution parameters (Layer 3). Architecture diagrams and documentation must reflect three layers, not just the loop.

D16: Compilation Over Retrieval

Accepted

Context

Most AI systems use RAG (Retrieval-Augmented Generation) — querying raw documents at runtime. Intent needs a knowledge strategy that supports cross-referencing, contradiction detection, and gap analysis.

Decision

The knowledge base compiles understanding once and keeps it current. Not RAG. Cross-references are already resolved. Contradictions are already flagged. Gaps are already detected. The compilation step is the value — turning raw material into structured, verified knowledge artifacts.

Rationale

RAG is stateless — every query re-discovers relationships. Compilation is stateful — relationships are discovered once and maintained. For a consulting practice with deep domain knowledge, compiled understanding is dramatically more valuable than per-query retrieval.

Consequences

Higher upfront cost per knowledge artifact but dramatically lower marginal cost per query. Freshening becomes a maintenance concern. The lint operation detects staleness and surfaces recompilation signals.

D17: Double-Loop Learning

Accepted

Context

Most feedback loops only optimize execution parameters (single-loop: "did we do the thing right?"). Chris Argyris identified a deeper loop: questioning the assumptions behind the action ("are we doing the right thing?"). Intent's Observe phase needs to support both.

Decision

Observe updates Layer 1 (domain understanding), not just Layer 3 (execution). Observations can question assumptions in the knowledge base — updating personas, revising domain models, flagging contradictions in compiled knowledge. This is the double-loop: observations → knowledge revision → better signals.

Rationale

Without double-loop learning, the system can only optimize within its existing frame. It can get faster at the wrong thing. The Observe→Knowledge path closes the gap between "what we think is true" and "what we observe to be true." This is where the real learning happens.

Consequences

The system becomes self-correcting at the domain level, not just the execution level. Knowledge artifacts need version history and change tracking. Observation-triggered knowledge updates need trust scoring to prevent noise from corrupting the knowledge base.

D18: Origin Tracking for Knowledge Artifacts

Accepted

Context

As AI agents generate more knowledge artifacts (dossiers, enrichments, inferences), the provenance of knowledge becomes critical. Which artifacts came from human research? Which were agent-generated? Which are synthetic composites?

Decision

Every knowledge artifact carries origin: human | agent | synthetic metadata for contamination mitigation. Human-originated artifacts have highest trust. Agent-originated artifacts require human review before promotion. Synthetic artifacts (composites, inferences) carry their source chain.

Rationale

Without origin tracking, the knowledge base becomes a black box — you can't distinguish verified human knowledge from speculative agent inference. This matters for consulting where recommendations must be defensible. Origin tracking is the foundation for trust calibration.

Consequences

Every ingest operation must classify origin. The lint operation can flag artifacts with missing or suspicious origin metadata. Knowledge queries can filter by origin for high-stakes decisions.

D19: Federated Knowledge Base Architecture

Accepted

Context

Brien works across multiple engagements (Subaru, F&G, ASA, Cargill) with different confidentiality requirements. The knowledge base needs to support both shared (universal) and scoped (engagement-specific) knowledge without leaking across boundaries.

Decision

Core = universal substrate (methodologies, frameworks, public knowledge). Engagements = bounded instances (client-specific personas, dossiers, domain models). Inheritance: inherit down (engagements see Core), promote up (engagement insights can become universal), never leak sideways (Subaru knowledge never visible from ASA context).

Rationale

Mirrors the Workspaces directory topology (Core/ + Work/Consulting/Engagements/[Client]/). Confidentiality is structural, not behavioral — the architecture enforces it. Redaction at tool level (D11) complements this by applying projection automatically at query time.

Consequences

Knowledge Engine MCP server (port 8004) must implement scope-aware queries. Promotion from engagement to Core requires explicit review. The federation model scales to any number of engagements without architectural changes.