Infrastructure

The three cost controls every LLM system should ship with

Token-level accounting, per-tenant budgets, and caching. Without these, cost is something accounting discovers next month.

All notes3 min read

The first control is token and dollar accounting at the request level. Every request logs input tokens, output tokens, model, dollar cost, and tenant ID. It sounds obvious; teams still ship without it. Without this data, every later control is guessing.

The second is per-tenant budgets. A hard monthly cap per customer, with alerts at 50%, 80%, and 100%. Past the cap, the system returns a clear error instead of silently burning money. A single misbehaving prompt loop can otherwise cost more in a weekend than the customer pays in a year.

The third is caching. Embeddings are deterministic, so caching them is free money — 40–80% savings on embedding spend is typical. Retrieval results cache well when queries repeat. Full generations cache when the traffic is FAQ-shaped. Each layer is optional; the combined effect is dramatic.

Add audit on top: periodic review of the longest prompts and oversized retrievals. Shortening verbose prompts and tightening top-k is the cheapest performance win in the deck, and it usually improves quality too.

Next note

Reliable agents are mostly code

Ready to accelerate your tech growth?

Schedule your free consultation today and let's discuss how we can help your business scale efficiently.

The three cost controls every LLM system should ship with

Ready to accelerate your tech growth?

Let’s ship your AI system.