Short notes from shipping AI in production.
Things we learn on real engagements, distilled into something you can read in 3 minutes.
Score retrieval before you score anything else
If you only grade final answers, you can't tell whether a failure is retrieval or generation. Grade retrieval in isolation first.
Design your MCP server surface before you write code
MCP has three surfaces: resources, tools, and prompts. Pick which you need and for whom before you touch an SDK.
The three cost controls every LLM system should ship with
Token-level accounting, per-tenant budgets, and caching. Without these, cost is something accounting discovers next month.
Reliable agents are mostly code
Agents that work in production are 80% deterministic code. The LLM is a small, scoped component — not the whole workflow.
Fine-tuning is worth it later than you think
Exhaust prompting, retrieval, and model selection first. Fine-tune when the eval still fails — and when a smaller tuned model is materially cheaper than a prompted frontier model.
Ready to accelerate your tech growth?
Schedule your free consultation today and let's discuss how we can help your business scale efficiently.
