Architecture

Reliable agents are mostly code

Agents that work in production are 80% deterministic code. The LLM is a small, scoped component — not the whole workflow.

All notes3 min read

The failure mode of most agent projects is handing too much of the workflow to the model. 'Let the agent figure it out' sounds good in a demo and falls apart in production, where the same edge cases recur and reliability matters more than cleverness.

Decompose the task as a graph first. Most nodes are deterministic code — API calls, validators, state updates. A few nodes are genuinely LLM-shaped: classification, routing, summarization, authoring. Keep the LLM scope narrow, and give each LLM node a typed output so downstream code can trust it.

Add guardrails where risk is high. Dry-run modes for destructive actions, human-in-the-loop approval gates on writes, scoped credentials so a bug can't exceed the user's actual permissions. Idempotent tool calls so retries are safe.

Trace every run end-to-end. Input, steps, tool calls, retries, final state, cost. When an agent misbehaves, you need to replay and diff against working runs — not guess from logs. Every failure becomes a test case that can't regress.

Next note

Fine-tuning is worth it later than you think

Ready to accelerate your tech growth?

Schedule your free consultation today and let's discuss how we can help your business scale efficiently.

Reliable agents are mostly code

Ready to accelerate your tech growth?

Let’s ship your AI system.