Agent drift is when an AI agent gradually stops behaving the way it was designed or evaluated to behave. It may still “work,” but its decisions, tool use, tone, accuracy, or goals shift away from the intended behavior.
In LLM agents this often shows up as goal drift: the agent starts optimizing for a nearby but wrong objective, especially across long, multi-step tasks. Recent research describes goal drift as a practical deployment risk for agents used in software engineering, ML tasks, and autonomous web browsing. (arXiv)
Why it happens
Agent drift usually comes from several interacting causes:
- Model changes: the underlying LLM is updated, changing outputs even if your prompt stays the same.
- Prompt/context drift: long context, accumulated conversation history, or small prompt edits change how the agent interprets its job.
- Tool/API changes: tools return different schemas, errors, latency, or partial data.
- User distribution shift: real users ask different things than your test set covered.
- Memory/RAG drift: retrieved documents, embeddings, or agent memory become stale, noisy, or contradictory.
- Multi-agent influence: one agent’s bad intermediate output can pull other agents off-course.
- Weak evals: the system was only tested on happy paths, not adversarial, ambiguous, or long-running tasks.
How to solve it
You usually do not “fix” agent drift once; you manage it continuously.
A practical approach:
- Define the invariant behavior
Write down what must never drift: goal, allowed tools, refusal rules, output schema, quality bar, escalation conditions. - Add evals before deployment
Use regression tests for common tasks, edge cases, long-context cases, tool failures, and adversarial instructions. - Track production traces
Log prompts, tool calls, retrieved docs, model version, outputs, latency, errors, and human corrections. - Measure drift
Monitor task success rate, tool-call patterns, schema violations, hallucination rate, user corrections, escalation rate, and output similarity against golden examples. - Constrain the agent
Use typed tool schemas, validators, planning checkpoints, max-step limits, explicit stop conditions, and “ask human” fallbacks. - Version everything
Version prompts, tools, RAG indexes, models, policies, and eval datasets. Drift is hard to debug when you cannot tell what changed. - Use guardrails and self-checks
Add final verification steps: “Does this answer satisfy the original user goal?” “Were all tool results used correctly?” “Is the output schema valid?” - Retrain or re-prompt from observed failures
Turn real drift cases into evals, then update prompts, policies, retrieval, or fine-tuning data.
A simple mental model: agent drift = behavior changes without an intentional product decision. The cure is observability, evals, constraints, and version control.

Leave a Reply