Case studies¶
Real case studies of how the SDD flow was applied across the archived changes of the Belico Ecosystem v1 roadmap. Each one follows the problem → SDD approach → measurable result structure. The metrics are those actually reported in each change's archive_report.md.
Why these case studies
They show the SDD flow (EXPLORE → PROPOSE → SPEC/DESIGN → TASKS → IMPLEMENT → VERIFY → ARCHIVE) is not theory: it produced production-grade software with auditable metrics and zero regressions.
Case study 1 — Phase B: Tier 2 Harnesses¶
Change: belico-harnesses-tier2-implementation (Change #5).
Problem¶
The stack had 5 operational Tier 1 harnesses, but the 10 Tier 2 harnesses of the catalog were missing (anti-AI humanizer, VERITAS citations, stats guardian, local Turnitin, retraction watch, etc.). Without them, catalog coverage was 50%.
SDD approach¶
13 IMPLEMENT batches with strict TDD (test first, code after), reusing the HarnessProtocol validated in Change #2. DI (dependency injection) prioritized over mocks. Shared helpers to reduce coupling (doi_resolver used by bibtex + retraction; perplexity_calc by turnitin).
Result¶
| Metric | Value |
|---|---|
| ACs met | 50/50 PASS (5 per harness × 10) |
| Tests | 465/465 PASS in 10.51s |
| Tier 1 regressions | 0 |
Mutations to inherited tools/ |
0 (Strangler Fig 100%) |
| ADs applied | 3 (T2-1 local Turnitin, T2-2 tenant-isolation inactive, T2-3 stats advisory) |
Key lesson: the HarnessProtocol pattern scales — all 10 Tier 2 reused the contract without redesign. Property-based testing (Hypothesis) was key in stats + bibtex + turnitin for edge cases.
Case study 2 — Phase C: Agent expansion¶
Change: belico-agents-expansion-fase-c (Change #6).
Problem¶
The pipeline had 8 sub-agents (Change #4), but the reviewers (peer, methodology, domain, devils_advocate) and the commercial agents (quotation, venue_router, submission_tracker, client_feedback) were missing. Without them, agent catalog coverage was 33%.
SDD approach¶
13 batches implementing 9 new sub-agents (5 reviewers + 4 commercial) reusing the AgentProtocol from Change #4. Cost optimization: Haiku for revision_coach and venue_router (~39% savings) without losing quality. CI auto-review workflow (vs git hooks). Skip auto-trigger on changes <50 lines (anti cost explosion).
Result¶
| Metric | Value |
|---|---|
| ACs met | 45/45 PASS (5 per agent × 9) |
| Tests | 290/290 PASS (272 unit + 10 integration + 8 CI) |
| Regressions | 0 (Tier 1 + Change #4 + Change #5) |
Mutations to tools/ |
0 (Strangler Fig 100%) |
| HMAC chain | Verified real (no mock) in integration tests |
| ADs applied | 4 (FC-1 Engram state, FC-2 GitHub Actions CI, FC-3 skip <50 lines, FC-4 Haiku) |
Key lesson: the AgentProtocol scales perfectly — all 9 agents reused without redesigning the contract. The CAS advisory lock prevents race conditions in the submission_tracker state machine. Independence is critical in devils_advocate.
Case study 3 — Phase D: Public CLI¶
Change: belico-cli-fase-d (Change #7).
Problem¶
The ecosystem had no public interface. To distribute the stack via pip install and so the documentation (Phase E) had real commands to document, a bilingual CLI was needed.
SDD approach¶
9 batches implementing 5 commands (create-paper, smoke-check, verify, catalog, ecosystem) with Click + Rich (NOT Typer, which transitively pulls Pydantic). Lazy-loading of harnesses/agents per command to preserve fast boot. Bilingualism via the centralized bilingua.py dict (90 keys, 0 missing). Strangler-fig: the CLI invokes tools/harness via adapters, never mutates.
Result¶
| Metric | Value |
|---|---|
| ACs met | 30/30 PASS |
| Tests | 178/178 PASS |
Coverage src/belico/ |
89.16% (gate ≥80%) |
| Invariants preserved | 7/7 (I1..I7) |
| Cold CLI boot | 278 ms (gate <350ms) |
| Zero-regression suite | 1004 passed (agents + harness + heritage) |
| CI matrix | 9 jobs (Ubuntu/Windows/macOS × Py 3.10/3.11/3.12) |
| ADs applied | 5 (CLI-1 Click+Rich, CLI-2 lazy-loading, CLI-3 bilingua dict, CLI-4 new package, CLI-5 PyPI prep) |
Key lesson: helpers/__init__.py must stay EMPTY — Rich eager re-exports introduced ~150ms boot regression. tools/init_project.py is TTY-interactive, so create-paper emulates the scaffold without running the script (strangler-fig).
Side-by-side summary¶
| Phase | Change | ACs | Tests | Coverage / Note |
|---|---|---|---|---|
| B | harnesses-tier2 | 50/50 | 465 | ≥90%, 0 regressions |
| C | agents-fase-c | 45/45 | 290 | 0 regressions, real HMAC |
| D | cli-fase-d | 30/30 | 178 | 89%, 278ms boot |
Case study 4 — UC Continental client (pending)¶
Placeholder — pending production
The first end-to-end commercial paper with the UC Continental client is queued (PRD §21.7). When produced, this case study will document the full flow: intake → COMPUTE → IMPLEMENT → VERIFY → submission, with the real metrics of a paper that passes Turnitin and sells. For now it is an explicitly marked placeholder.
See also¶
- The ecosystem — the 12-week roadmap.
- The pipeline — the applied SDD flow.
- Architecture — the 4 pillars.
Source
Metrics extracted from each change's archive_report.md in openspec/changes/.