Engineering Notes

Short notes from shipping AI in production.

Things we learn on real engagements, distilled into something you can read in 3 minutes.

Score retrieval before you score anything else

If you only grade final answers, you can't tell whether a failure is retrieval or generation. Grade retrieval in isolation first.

MCP3 min read

Design your MCP server surface before you write code

MCP has three surfaces: resources, tools, and prompts. Pick which you need and for whom before you touch an SDK.

Infrastructure3 min read

The three cost controls every LLM system should ship with

Token-level accounting, per-tenant budgets, and caching. Without these, cost is something accounting discovers next month.

Architecture3 min read

Reliable agents are mostly code

Agents that work in production are 80% deterministic code. The LLM is a small, scoped component — not the whole workflow.

Engineering3 min read

Fine-tuning is worth it later than you think

Exhaust prompting, retrieval, and model selection first. Fine-tune when the eval still fails — and when a smaller tuned model is materially cheaper than a prompted frontier model.

Ready to accelerate your tech growth?

Schedule your free consultation today and let's discuss how we can help your business scale efficiently.