The System
How a single classifier grew into a four-repo system, and the discipline I put on the seams as it did.
This did not start as a system. It started as one project, the defense-news classifier. A notes API, a RAG agent, and a knowledge base came later. Once there were four repos that had to talk to each other, I had a choice: let them fuse into one tangled codebase, or impose discipline on the seams between them. I chose discipline and recorded why at each step.
A note up front, because a senior engineer will ask it anyway: at one-person scale this is more rigor than the code strictly needs. That is deliberate. I built it as a practice rep for the seam-level judgment that becomes essential the moment the repos belong to different teams. The point was never that the system is large. It is that the discipline is real and the decisions are written down.
What it grew into
- notes-apiSpring Boot REST service. The knowledge base and data layer.
- defense-news-classifierLLM classifier with a real eval harness. Labels the content.
- kb-agentRAG and tool-use agent. The hub that reasons over the system.
- learning-notesPlain-language notes on the techniques behind it all.
The map
Click a box to see what each part does. The same flow is written out in words further down, so this is an enhancement, not a requirement.
The problem
Once it was four repos, the easy path was to fuse them so they could call each other directly. That buys convenience and pays for it with independence: one release cycle, one blast radius, one thing to break. The goal was the opposite, to let them act as one system while each stays deployable on its own. That problem lives in the contracts between the pieces, not in the pieces themselves.
The decisions
Decision · SYS-004 / SYS-006
Decouple with contracts, not shared code
The repos talk over frozen HTTP and event contracts with explicit versioning rules. Both sides carry contract tests in CI.
Why: decoupling without a contract is just hope. A renamed field breaks the consumer silently at runtime. Tradeoff: more upfront ceremony (freeze the wire shape, write the tests) so that drift fails a build instead of production.
Decision · SYS-005
Make the loop event-driven and idempotent by design
Creating a note emits a Kafka event. The classifier consumes it,
classifies in-process, and writes labels back as namespaced tags
(category: / domain:) with replace semantics.
Why: Kafka's at-least-once delivery means events arrive twice, so the writeback has to converge on reprocessing instead of accumulating, and note creation never waits on classifier uptime. Tradeoff: idempotency is real design work (independent retry paths, poison-message handling), not naive fire-and-forget.
Decision · SYS-002
Gate model choice on evidence, not vibes
Default to Sonnet across the system. Escalate to Opus only where an eval shows the quality gain pays for itself.
Why: the classifier's accuracy ceiling turned out to be label ambiguity, not model horsepower, so paying roughly 1.7× for a bigger model would have bought nothing. Tradeoff: you have to run the eval to justify the tier instead of defaulting to the biggest model.
Decision · SYS-001 / SYS-008
Record decisions, then make the system legible
Cross-repo decisions live as two-tier ADRs (repo-local choices stay local;
system-level ones are SYS-NNN in one place). A generated portal
aggregates every repo's docs into one browsable view that deep-links to the
code. Aggregate, never duplicate.
Why: a system you cannot see in one place is narrated, not legible, and a decision that is not written down gets re-argued.
How it flows
- A note is created in notes-api, which publishes a
NoteCreatedevent to the Kafkanote-eventstopic. - The classifier's consumer reads the event.
- It classifies the text in-process (one Sonnet call, structured output forced via tool-use) into a category and an operational domain.
- It writes the labels back as namespaced tags (
PUT /notes/{id}/tags, replace semantics), preserving the user's own tags. - The kb-agent reads notes (
GET /notes) to ground its RAG answers, and can also call the classifier synchronously.
The outcome
- Create a note and it is automatically classified and tagged, asynchronously, with no coupling between note creation and classifier availability.
- The four repos stay independently deployable, and cross-repo drift is caught by a red build rather than a production incident.
- The whole system is browsable in one portal, with links straight to the real code.
- It is articulated, not just built: a skill map (SYS-007) names the AI-era capabilities the system exercises (evals, context engineering, agents, observability) and the honest gap, which is security on the agent's tool seam.
What this shows is not that I designed a perfect system upfront. Nobody does. It is that as the system grew, I reached for the seam-level discipline (contracts, idempotency, recorded decisions, evidence over intuition) that keeps a multi-repo system from turning into a tangle. The proof is in the ADRs and the contract tests, not just in this paragraph.