Leasey
Tool

RAG cost estimator.

Back-of-envelope monthly cost and latency for a production RAG system. Adjust model, traffic, tokens, cache hit rate, and retrieval top-k.

Requests per day10,000
Avg input tokens (query + retrieved context)3,500 tok
Avg output tokens (answer)450 tok
Cache hit rate30%
Retrieval top-k40 chunks
Monthly estimate
$3.7k
300,000 requests · $0.01 per request
Input tokens$2.2k
Output tokens$1.4k
Embedding (re-embed on miss)$34
Est. p95 latency1200 ms

Rough estimate using public list prices at time of writing. Real costs depend on routing, caching layers, rerankers, batching, and negotiated rates.

A few things this doesn't capture.

  • Rerankers. Cross-encoder rerankers add their own token cost (small) and latency (modest). We omit this — typical impact is 5-15% of total cost.
  • Model routing. Production systems route cheap vs. frontier per query. Our estimate assumes a single model — real cost is usually 30-60% lower with routing.
  • Enterprise discounts. Bedrock, Azure OpenAI, and Vertex negotiate below public list prices for committed spend. Treat the numbers as ceiling estimates.
  • Observability. Tracing, eval runs, and log storage add meaningful cost at scale — typically 5-10% of inference spend.

Want a tighter estimate specific to your system? We'll run one in a 30-minute discovery call.

Ready to accelerate your tech growth?

Schedule your free consultation today and let's discuss how we can help your business scale efficiently.

Tech growth illustration
Ready when you are

Let’s ship your AI system.

Whether you’re scoping a new LLM product, hardening an existing one, or standing up the infra behind it — we’ll map the shortest path to production.

Email the teamOther ways to reach us