Skip to content

Case studies

Real case studies of how the SDD flow was applied across the archived changes of the Belico Ecosystem v1 roadmap. Each one follows the problem → SDD approach → measurable result structure. The metrics are those actually reported in each change's archive_report.md.

Why these case studies

They show the SDD flow (EXPLORE → PROPOSE → SPEC/DESIGN → TASKS → IMPLEMENT → VERIFY → ARCHIVE) is not theory: it produced production-grade software with auditable metrics and zero regressions.

Case study 1 — Phase B: Tier 2 Harnesses

Change: belico-harnesses-tier2-implementation (Change #5).

Problem

The stack had 5 operational Tier 1 harnesses, but the 10 Tier 2 harnesses of the catalog were missing (anti-AI humanizer, VERITAS citations, stats guardian, local Turnitin, retraction watch, etc.). Without them, catalog coverage was 50%.

SDD approach

13 IMPLEMENT batches with strict TDD (test first, code after), reusing the HarnessProtocol validated in Change #2. DI (dependency injection) prioritized over mocks. Shared helpers to reduce coupling (doi_resolver used by bibtex + retraction; perplexity_calc by turnitin).

Result

Metric Value
ACs met 50/50 PASS (5 per harness × 10)
Tests 465/465 PASS in 10.51s
Tier 1 regressions 0
Mutations to inherited tools/ 0 (Strangler Fig 100%)
ADs applied 3 (T2-1 local Turnitin, T2-2 tenant-isolation inactive, T2-3 stats advisory)

Key lesson: the HarnessProtocol pattern scales — all 10 Tier 2 reused the contract without redesign. Property-based testing (Hypothesis) was key in stats + bibtex + turnitin for edge cases.

Case study 2 — Phase C: Agent expansion

Change: belico-agents-expansion-fase-c (Change #6).

Problem

The pipeline had 8 sub-agents (Change #4), but the reviewers (peer, methodology, domain, devils_advocate) and the commercial agents (quotation, venue_router, submission_tracker, client_feedback) were missing. Without them, agent catalog coverage was 33%.

SDD approach

13 batches implementing 9 new sub-agents (5 reviewers + 4 commercial) reusing the AgentProtocol from Change #4. Cost optimization: Haiku for revision_coach and venue_router (~39% savings) without losing quality. CI auto-review workflow (vs git hooks). Skip auto-trigger on changes <50 lines (anti cost explosion).

Result

Metric Value
ACs met 45/45 PASS (5 per agent × 9)
Tests 290/290 PASS (272 unit + 10 integration + 8 CI)
Regressions 0 (Tier 1 + Change #4 + Change #5)
Mutations to tools/ 0 (Strangler Fig 100%)
HMAC chain Verified real (no mock) in integration tests
ADs applied 4 (FC-1 Engram state, FC-2 GitHub Actions CI, FC-3 skip <50 lines, FC-4 Haiku)

Key lesson: the AgentProtocol scales perfectly — all 9 agents reused without redesigning the contract. The CAS advisory lock prevents race conditions in the submission_tracker state machine. Independence is critical in devils_advocate.

Case study 3 — Phase D: Public CLI

Change: belico-cli-fase-d (Change #7).

Problem

The ecosystem had no public interface. To distribute the stack via pip install and so the documentation (Phase E) had real commands to document, a bilingual CLI was needed.

SDD approach

9 batches implementing 5 commands (create-paper, smoke-check, verify, catalog, ecosystem) with Click + Rich (NOT Typer, which transitively pulls Pydantic). Lazy-loading of harnesses/agents per command to preserve fast boot. Bilingualism via the centralized bilingua.py dict (90 keys, 0 missing). Strangler-fig: the CLI invokes tools/harness via adapters, never mutates.

Result

Metric Value
ACs met 30/30 PASS
Tests 178/178 PASS
Coverage src/belico/ 89.16% (gate ≥80%)
Invariants preserved 7/7 (I1..I7)
Cold CLI boot 278 ms (gate <350ms)
Zero-regression suite 1004 passed (agents + harness + heritage)
CI matrix 9 jobs (Ubuntu/Windows/macOS × Py 3.10/3.11/3.12)
ADs applied 5 (CLI-1 Click+Rich, CLI-2 lazy-loading, CLI-3 bilingua dict, CLI-4 new package, CLI-5 PyPI prep)

Key lesson: helpers/__init__.py must stay EMPTY — Rich eager re-exports introduced ~150ms boot regression. tools/init_project.py is TTY-interactive, so create-paper emulates the scaffold without running the script (strangler-fig).

Side-by-side summary

Phase Change ACs Tests Coverage / Note
B harnesses-tier2 50/50 465 ≥90%, 0 regressions
C agents-fase-c 45/45 290 0 regressions, real HMAC
D cli-fase-d 30/30 178 89%, 278ms boot

Case study 4 — UC Continental client (pending)

Placeholder — pending production

The first end-to-end commercial paper with the UC Continental client is queued (PRD §21.7). When produced, this case study will document the full flow: intake → COMPUTE → IMPLEMENT → VERIFY → submission, with the real metrics of a paper that passes Turnitin and sells. For now it is an explicitly marked placeholder.

See also

Source

Metrics extracted from each change's archive_report.md in openspec/changes/.