Incident reconstruction

When an AI agent breaks production, can you prove what it did?

Through 2026, a recurring story has played out across engineering teams: an autonomous coding or ops agent, given real access, deletes a production database, runs a destructive command, or loops for hours running up thousands in API spend. The aftermath is always the same scramble. Someone asks the only question that matters, and nobody can answer it with certainty: what exactly did the agent do?

The new failure mode

Agents are no longer just chatting. They hold credentials. They run shell commands, call APIs, move money, send email, and touch infrastructure. That is enormously useful, right up until one of them does the wrong thing at machine speed. When it does, you are not debugging a function, you are running a forensic investigation, and you are doing it against your own logs.

Why ordinary logs fail at exactly the wrong moment

Most teams discover the gap only during the post-mortem. Three problems show up again and again:

"The conversation log captured the tool's output but not the actual command that was executed, making it impossible to determine exactly what happened." A pattern echoed across publicly reported 2026 agent incidents.

What you actually need is a record you can defend

Debugging tools answer "why did the agent do that." A forensic record answers a harder question: "can you prove, to someone who does not trust you, that this is what happened, in this order, at this time, and that nobody altered it afterward." Those are different jobs. The second one needs three properties an ordinary log does not have:

What a verifiable record gives you the day after

Provenrail captures every model call, tool call, decision, and human approval your agent makes into an off-box, hash-chained, signed sink, with trusted timestamps and a public transparency log on paid plans. When something goes wrong, you have:

Honest about the limit. Provenrail proves what your agent recorded; it cannot force an agent that never calls the SDK to write. Completeness is never claimed. What reaches the sink, though, is immutable and independently verifiable, and that is what turns a chaotic post-mortem into a defensible record.

This is not only for regulated teams

The freelancer who built a client an AI workflow faces the same question in a smaller key: the client says the agent sent that email or made that change, the freelancer says it did not. A verifiable record settles it in seconds with proof neither side has to take on faith. The same record that satisfies an EU AI Act auditor settles a billing dispute.

Instrument your agent in minutes Watch a tampered record get rejected EU AI Act Article 12

FAQ

My AI agent deleted production data. How do I find out what it did?
If you recorded the run with a tamper-evident, off-box audit trail, export the bundle and verify it: you get the exact ordered sequence of every model and tool call, provably unaltered. Without such a record, you are limited to whatever your application logs captured, which often omit the exact command and can have been changed since.
Why not just use my existing logs or tracing tool?
Those are excellent for debugging, but they live in a store you control and can be edited or deleted, so they are not evidence you can hand to a party who does not trust you. A verifiable record is signed, hash-chained off-box, and independently checkable.
Can the agent tamper with its own audit trail?
Records are pushed off-box to an append-only sink, and any alteration is detectable by the open-source verifier. The honest limit is completeness: an agent that never calls the SDK will not appear. What it does record cannot be silently changed.
How fast can I add this?
Install the SDK and wrap your agent run in one context manager. Drop-in capture works for the OpenAI and Anthropic clients, LangChain, and MCP; anything else records with one line. See the from-zero guide.
← Back to home