The Human Must Remain the Control Surface
Introducing the Oasis-X AI Risk Ladder, and where the human stays in the loop at each tier.
The Question We Got Wrong
The first time a coding agent we were piloting deleted a row in production, it wasn't because the model misunderstood the task. It understood the task fine. It had a tool — a SQL execution tool — that nobody had thought to put behind approval. The agent asked itself the right question ("how do I fulfil this request?") and answered it correctly ("run this DELETE"). The audit trail was clean. The transcript was readable. The row was gone.
The failure wasn't in the prompt. It wasn't in the model. It was at the boundary between the model and the world.
For the last eighteen months the public conversation around AI agents has been organised around a binary: should we use them or not? That framing has the wrong number of answers. It treats every agentic system as the same kind of system, and asks whether the entire category is worth the risk.
The more useful question is: where does the human stay in the loop?
Different AI workflows touch different parts of the world. A model summarising a meeting is not the same kind of system as a model issuing SQL against a production database. The first costs you a wrong sentence; the second costs you a row. Treating them with the same controls — or with no controls — is what produces the incidents we keep reading about.
This is the first post in our Human-Controlled Agents series. It introduces the Oasis-X AI Risk Ladder, the structure we use to think about what controls belong at what tier. The rest of the series builds on it.
The AI Risk Ladder
Five tiers. Each has a different centre of gravity for who's deciding.
1. Assistive
The AI drafts, summarises, suggests, or explains. The human decides.
This is the lowest-risk tier and probably eighty per cent of real-world AI use today. Autocompletion. Meeting summaries. "Draft me a reply." "What does this function do?" The model produces text; a human reads it and acts (or doesn't). If the model is wrong, the human catches it — or they don't, and the cost is a confused email.
Controls needed: transparency that the output is AI-generated, and a way for the human to ignore it. That's mostly already in place across consumer products. Privacy gets a mention even here, because even at Assistive tier the model is reading something — what does it see, where does that data go, does it train on it?
2. Delegated
The AI performs a bounded task with reviewable outputs and logs.
The agent runs in a sandbox. It writes files, runs scripts, produces artifacts. A human reviews the artifacts before merging or shipping. Think: a coding agent that opens a PR. A research agent that produces a markdown report. The agent acts, but its actions are scoped and reviewable.
Controls needed: scoped permissions (the agent only touches what it's supposed to touch), reviewable artifacts (PRs, reports, diffs), and session history. You need to be able to read what the agent did, not just see the result.
This is where transcript infrastructure starts mattering. If an agent produces a report and you don't know which sources it consulted, "Delegated" silently becomes "Operational" — the agent is now affecting your decision-making, but you've lost the chain.
3. Operational
The AI can affect systems, customers, money, credentials, or data. Approval gates, monitoring, and rollback required.
This is the tier most production AI agents are sliding into without realising it. A customer-support agent that issues refunds. A site-reliability agent that restarts services. A trading agent. A coding agent that deploys.
At this tier, the model's mistake has external consequences. The architecture must reflect that:
- Approval gates on sensitive actions — the agent proposes, a human confirms.
- Monitoring that surfaces what's happening in real time, not after the fact.
- A rollback plan for each class of action.
This is where oasis-claw's approval-gate extension exists, and where session-history stops being a nice-to-have and becomes load-bearing.
4. High-Impact
The AI affects health, legal, finance, employment, safety, or essential services. Formal governance required.
A model used in medical triage. A model deciding hiring outcomes. A model in a self-driving car. At this tier, the consequences of a wrong decision are non-recoverable in ways the previous tiers aren't. A bad refund can be reissued. A bad triage decision can't.
The controls at this tier are not just engineering; they're institutional: documented decision criteria, external evaluation, regulatory compliance, accountability structures, periodic auditing. Engineering work supports those processes — it doesn't replace them.
If you're here, the question of whether to use AI at all is reasonable to revisit periodically. Not because AI can't be useful at this tier, but because the bar for "good enough" is high enough that "we tried it and it's mostly fine" is the wrong evaluation.
5. Prohibited
Deceptive systems. Coercive systems. Privacy-invasive surveillance. Non-consensual personalisation. Systems designed to remove meaningful human control.
There are uses we don't deploy at any tier. The list isn't long, but it's a list. The reason "Prohibited" sits on the ladder rather than off it is that it's worth saying out loud: we make these decisions, and we make them before the engineering starts.
What Belongs at Each Tier
The mistake to avoid is treating "human in the loop" as a single checkbox. It isn't. At each tier, the human in the loop is doing a different thing — and the engineering should make that thing easy.
| Tier | The human's job | Mechanism |
|---|---|---|
| 1 — Assistive | Read and accept or reject | Visible AI provenance |
| 2 — Delegated | Review artifacts before they ship | PRs, reports, transcripts |
| 3 — Operational | Approve sensitive actions; respond to alerts | Approval gates, monitoring, audit logs |
| 4 — High-Impact | Govern: criteria, evaluation, accountability | Process plus the above |
| 5 — Prohibited | Refuse | No mechanism — a decision |
The infrastructure we build at Oasis-X maps onto the right column. The approval-gate and session-history plugins in oasis-claw are tier 3 work. The clawhub-skill-audit plugin — which grades every newly-installed agent skill against a malicious-pattern catalogue before an agent can load it — is supply-chain control for tiers 2–4. The dot_swarm coordination protocol makes the human's review work legible across multiple agents and sessions, which is what keeps tier-2 work from quietly drifting into tier-3.
Inspectable. Interruptible. Reversible. Accountable.
A useful four-word shorthand for "the human stays in the loop":
Inspectable. You can read what the agent did. Not "what the agent says it did" — what it actually did. Append-only transcripts. Tool-call logs. The model can be wrong about its own behaviour; the log can't.
Interruptible. You can stop it. Mid-action, not just between actions. The approval gate is the simplest version — the agent asks before it acts. The escape hatch is a related version — the agent can be paused, redirected, or killed by an operator.
Reversible. You can undo it. This is the property we're weakest on industry-wide. Approval gates prevent some actions from happening; audit logs let you reconstruct what did happen. Neither reverses the action. For the kind of agent work we ship today this gap is real, and we name it as a roadmap item rather than a current guarantee. Named-checkpoint primitives in the gateway are the right next step.
Accountable. You can trace the decision. Which model. Which prompt. Which tool. Which operator approved it. This is where the JSONL transcript and .swarm/trail.log matter — not because anyone reads them every day, but because when something goes wrong, the question what happened? has an answer.
If you can do those four things, the human is still the control surface even when the agent is moving fast.
What This Series Covers
This post introduced the frame. The next four posts work through the practical engineering:
- 1.2 — Inspectable, interruptible, reversible, accountable. What each property requires from the gateway. What ships today and what's still on the roadmap.
- 1.3 — Why agent handoffs are a governance problem. Sessions end. Context windows compact. The next agent has no idea what the previous one decided.
dot_swarmmakes that state durable. - 1.4 — Git as an audit substrate for AI work. Why the audit trail should be the same artifact you already version-control. The live
.swarm/files acrossoasis-mainas the proof.
If you're trying to ship something next week and want the practitioner companion series — The Practical AI Safety Stack (Series 2) — start with Prompt Injection Is an Operational Risk, Not a Prompting Problem (next post).
See also: the Oasis-X global AI challenges page, oasis-claw, and dot_swarm.