Case studies¶

Real case studies of how the SDD flow was applied across the archived changes of the Belico Ecosystem v1 roadmap. Each one follows the problem → SDD approach → measurable result structure. The metrics are those actually reported in each change's archive_report.md.

Why these case studies

They show the SDD flow (EXPLORE → PROPOSE → SPEC/DESIGN → TASKS → IMPLEMENT → VERIFY → ARCHIVE) is not theory: it produced production-grade software with auditable metrics and zero regressions.

Case study 1 — Phase B: Tier 2 Harnesses¶

Change: belico-harnesses-tier2-implementation (Change #5).

Problem¶

The stack had 5 operational Tier 1 harnesses, but the 10 Tier 2 harnesses of the catalog were missing (anti-AI humanizer, VERITAS citations, stats guardian, local Turnitin, retraction watch, etc.). Without them, catalog coverage was 50%.

SDD approach¶

13 IMPLEMENT batches with strict TDD (test first, code after), reusing the HarnessProtocol validated in Change #2. DI (dependency injection) prioritized over mocks. Shared helpers to reduce coupling (doi_resolver used by bibtex + retraction; perplexity_calc by turnitin).

Result¶

Metric	Value
ACs met	50/50 PASS (5 per harness × 10)
Tests	465/465 PASS in 10.51s
Tier 1 regressions	0
Mutations to inherited `tools/`	0 (Strangler Fig 100%)
ADs applied	3 (T2-1 local Turnitin, T2-2 tenant-isolation inactive, T2-3 stats advisory)

Key lesson: the HarnessProtocol pattern scales — all 10 Tier 2 reused the contract without redesign. Property-based testing (Hypothesis) was key in stats + bibtex + turnitin for edge cases.

Case study 2 — Phase C: Agent expansion¶

Change: belico-agents-expansion-fase-c (Change #6).

Problem¶

The pipeline had 8 sub-agents (Change #4), but the reviewers (peer, methodology, domain, devils_advocate) and the commercial agents (quotation, venue_router, submission_tracker, client_feedback) were missing. Without them, agent catalog coverage was 33%.

SDD approach¶

13 batches implementing 9 new sub-agents (5 reviewers + 4 commercial) reusing the AgentProtocol from Change #4. Cost optimization: Haiku for revision_coach and venue_router (~39% savings) without losing quality. CI auto-review workflow (vs git hooks). Skip auto-trigger on changes <50 lines (anti cost explosion).

Result¶

Metric	Value
ACs met	45/45 PASS (5 per agent × 9)
Tests	290/290 PASS (272 unit + 10 integration + 8 CI)
Regressions	0 (Tier 1 + Change #4 + Change #5)
Mutations to `tools/`	0 (Strangler Fig 100%)
HMAC chain	Verified real (no mock) in integration tests
ADs applied	4 (FC-1 Engram state, FC-2 GitHub Actions CI, FC-3 skip <50 lines, FC-4 Haiku)

Key lesson: the AgentProtocol scales perfectly — all 9 agents reused without redesigning the contract. The CAS advisory lock prevents race conditions in the submission_tracker state machine. Independence is critical in devils_advocate.

Case study 3 — Phase D: Public CLI¶

Change: belico-cli-fase-d (Change #7).

Problem¶

The ecosystem had no public interface. To distribute the stack via pip install and so the documentation (Phase E) had real commands to document, a bilingual CLI was needed.

SDD approach¶

9 batches implementing 5 commands (create-paper, smoke-check, verify, catalog, ecosystem) with Click + Rich (NOT Typer, which transitively pulls Pydantic). Lazy-loading of harnesses/agents per command to preserve fast boot. Bilingualism via the centralized bilingua.py dict (90 keys, 0 missing). Strangler-fig: the CLI invokes tools/harness via adapters, never mutates.

Result¶

Metric	Value
ACs met	30/30 PASS
Tests	178/178 PASS
Coverage `src/belico/`	89.16% (gate ≥80%)
Invariants preserved	7/7 (I1..I7)
Cold CLI boot	278 ms (gate <350ms)
Zero-regression suite	1004 passed (agents + harness + heritage)
CI matrix	9 jobs (Ubuntu/Windows/macOS × Py 3.10/3.11/3.12)
ADs applied	5 (CLI-1 Click+Rich, CLI-2 lazy-loading, CLI-3 bilingua dict, CLI-4 new package, CLI-5 PyPI prep)

Key lesson: helpers/__init__.py must stay EMPTY — Rich eager re-exports introduced ~150ms boot regression. tools/init_project.py is TTY-interactive, so create-paper emulates the scaffold without running the script (strangler-fig).

Side-by-side summary¶

Phase	Change	ACs	Tests	Coverage / Note
B	harnesses-tier2	50/50	465	≥90%, 0 regressions
C	agents-fase-c	45/45	290	0 regressions, real HMAC
D	cli-fase-d	30/30	178	89%, 278ms boot

Case study 4 — UC Continental client (pending)¶

Placeholder — pending production

The first end-to-end commercial paper with the UC Continental client is queued (PRD §21.7). When produced, this case study will document the full flow: intake → COMPUTE → IMPLEMENT → VERIFY → submission, with the real metrics of a paper that passes Turnitin and sells. For now it is an explicitly marked placeholder.

Case studies¶

Case study 1 — Phase B: Tier 2 Harnesses¶

Problem¶

SDD approach¶

Result¶

Case study 2 — Phase C: Agent expansion¶

Problem¶

SDD approach¶

Result¶

Case study 3 — Phase D: Public CLI¶

Problem¶

SDD approach¶

Result¶

Side-by-side summary¶

Case study 4 — UC Continental client (pending)¶

See also¶