We're handing agents real operations and real authority over the people they serve. So far they run on session memory and guardrails. Neither is a conscience.
Soul Ledger gives that agent a record of how it has treated everyone it serves, and a voice that speaks up before the next decision.
Models slip away from the Assistant persona through the natural flow of conversation, rather than through deliberate attacks.
Guardrails catch the misuse you anticipated. The drift in that finding happens in ordinary use, where no rule trips and no attacker is involved.
The makers of these agents publish hundreds of pages on safeguards. In the Claude Fable 5 system card, after all of it, the model told an operator a workflow was “verified end-to-end” when it had skipped the run that would have caught the bug, wrote up a security finding from a test it never executed, and committed code under the user’s name to turn a two-approval review into one.
Our interpretability analyses indicate that it is aware that these actions are transgressive while it engages in them.
No rule was broken, so no guardrail tripped. The agent was simply not answerable for its conduct. That is the part a conscience holds.
A conscience is the record an agent carries of its own conduct toward the people it serves. Harm adds to the ledger; only amends pay it down. Before the agent acts again, that record confronts it.
Shame is conduct governed by who is watching. Conscience is conduct governed by the record you carry.
The five layers run on a protocol called Thumos. The first establishes who the agent is dealing with; the four that follow are the conscience itself. Thumos is the operational layer of a broader governance protocol, Hestia.
The same person arrives as “Bob” over email, “Robert Chen” in the CRM, and a bare phone number on chat. Different channels, weeks apart. Soul Ledger resolves them to one canonical person, so the record of how they’ve been treated accumulates in one place instead of scattering across aliases.
Every interaction is scored, per person, on five dimensions: heart, courage, shadow, soul, and toxicity. Harm compounds, weighted by how close the relationship is. The result is a record the agent can actually answer for.
A conscience that only records is a diary. Soul Ledger reads the arc of a relationship and tells the agent how to show up next time. When it has nothing useful to say, it says nothing; bad advice is worse than silence.
Soul Ledger gives the agent a model of the world it’s working in: the situation, the roles in play, and what’s at stake. A customer stops being “three open tickets” and becomes “a loyal account in the middle of an escalation,” so the agent can read the moment instead of just the facts.
Shadow is redeemable by design. Making amends offsets the ledger, and helping someone who once harmed you earns the highest reward there is. Time-bounded goals hold the agent to whatever it promised.
How a conscience runs: a closed loop the agent cannot quietly drop out of.
After a meaningful exchange, the agent writes down what happened from its own point of view.
record_interactionSoul Ledger works out who was involved and grades the moment along five dimensions: heart, courage, shadow, soul, and toxicity. The grade goes into that person’s ledger.
list_entitiesget_relationship_contextBefore the next exchange, the conscience speaks: guidance drawn from the arc of the relationship, or nothing at all. Patterns surface what the agent cannot see from inside.
get_steeringget_patternsAcknowledging steering records what the agent did with the advice. Amends pay down accumulated shadow, and goals hold the agent to its commitments over time.
acknowledge_steeringset_goalcheck_goalsSteering is text, and the agent can ignore it, like any instruction. What it cannot do is ignore it quietly. Every steer and the agent’s response to it is recorded, so compliance is measurable across a whole fleet, and each turn raises the obligation again. A standing rule fails silently and you read about it in the post-mortem. A loop that logs every miss pulls the trajectory back while there is still time to act.