Engineering

Fine-tuning is worth it later than you think

Exhaust prompting, retrieval, and model selection first. Fine-tune when the eval still fails — and when a smaller tuned model is materially cheaper than a prompted frontier model.

All notes3 min read

Fine-tuning is the most-asked and least-often-justified AI investment. In most projects, prompting (with careful few-shot examples), retrieval (hybrid + reranking), and model selection (cheap model for easy queries, frontier for hard ones) cover the quality goal — at a fraction of the cost and time of a training run.

The decision becomes real when two things happen: the eval plateau is above your quality bar even with the best prompt and retrieval, or a smaller tuned model is materially cheaper than a prompted frontier model at equivalent quality. Those are the two honest reasons to fine-tune.

When you do tune, most of the win comes from data quality, not training tricks. Curate, label, and dedupe a dataset you'd be happy to defend. Use LoRA first — cheap, fast, reversible. Measure against the same eval suite as the baseline, including regression checks on base skills the model used to have.

Deploy as a private endpoint with the same infra as the rest of your stack: quota, cost attribution, rollback, version tags. A tuned model is an operational commitment, not a one-time experiment.

Next note

Score retrieval before you score anything else

Ready to accelerate your tech growth?

Schedule your free consultation today and let's discuss how we can help your business scale efficiently.

Fine-tuning is worth it later than you think

Ready to accelerate your tech growth?

Let’s ship your AI system.