AI agents at work June 29, 2026 • 5 min read

The approve button is not an AI safety plan

AI agents are getting better at doing work across apps. The useful question is not whether a human is technically in the loop. It is whether the human can see the risk, stop the run, and understand what changed afterward.

By Cass Bell • 3 sources • 13 impressions

A realistic office desk with a laptop showing a blurred automation dashboard, an unlabelled red stop button, a checklist, coffee mug, pencil, and phone. — Generated editorial photograph-style illustration.

The safest-looking AI agent is often the most annoying one: it asks before every step until the person supervising it stops reading.

That is the trap in a lot of workplace AI right now. Vendors can say a human is in the loop because there is an approval button somewhere. Fine. If the work is long, repetitive, and mostly low-risk, people will learn to click through. Not because they are reckless. Because the product trained them to treat approval as background noise.

Anthropic's February study of millions of human-agent interactions is useful here because it measures the shift instead of hand-waving about trust. Newer Claude Code users used full auto-approve in roughly 20% of sessions. Around 750 sessions, that rose above 40%. The same experienced users also interrupted more. In other words: they were not disappearing. They were moving from per-click permission to supervisory steering.

The loop is changing

This matters because AI agents are no longer only answering questions. OpenAI's workspace agents can run in the cloud, work on schedules, connect to team tools, draft follow-ups, and ask for approval when needed. Google's June Gemini update put computer use directly into Gemini 3.5 Flash, so developers can build agents that see and act across browser, mobile, and desktop environments.

Those are normal work surfaces. Sales notes. Spreadsheets. tickets. browser forms. internal tools. The more an assistant can actually do, the less useful it is to interrupt the human for every tiny move. A permission popup on step 37 of 92 is not oversight. It is a tiny compliance costume.

The better question is where the consequence changes. Reading a file is not the same as sending an email. Drafting a refund note is not the same as issuing the refund. Opening a vendor page is not the same as approving a purchase. If the product treats all of those as equal clicks, users will eventually treat them as equal too.

Google's safeguard list points at the right problem

Google says Gemini 3.5 Flash's computer-use capability includes targeted adversarial training for prompt injection, plus optional enterprise safeguards that can require explicit user confirmation for sensitive or irreversible actions and automatically stop tasks when indirect prompt injection is identified. That is more serious than a blanket ask-before-everything model.

But even that only works if the product explains why this moment is different. A useful confirmation does not say, in effect, 'do you approve the next mystery click?' It says what is about to change, who is affected, what evidence the agent used, and what happens if the human says no.

Otherwise the confirmation becomes another inbox. People already have too many places asking for one more tiny decision. The agent is supposed to remove busywork, not turn busywork into a row of buttons.

Human-in-the-loop is not magic wording

Anthropic's paper makes a clean distinction that product teams should steal: effective oversight is not approving every action. It is being in a position to intervene when it matters. That is a much harder design problem than adding an approval modal.

It means the user needs a live summary they can trust, not a transcript dump. They need a stop control that works mid-run. They need risk markers that are tied to real consequences. They need an after-action note that says what changed, what failed, what was skipped, and what still needs a person.

The dry version of this is auditing. The human version is simpler: if an AI assistant touches your work while you are in a meeting, can you understand the result in 30 seconds afterward without replaying the whole run? If not, the agent did not give you time back. It borrowed time from later.

What to check before turning one loose at work

Start with one repeated job that already wastes time: a weekly report, a lead-research packet, a support-account summary, a document cleanup pass. Do not start with 'use my whole computer.' That is not a pilot. That is a haunted house tour.

Define three consequence levels before the first run. Low: read, summarize, organize, draft. Medium: change a shared document, create a ticket, message a teammate. High: send outside the company, spend money, change customer/account state, delete, publish, or touch regulated data. The approval rule should follow the level, not the model's confidence voice.

Then measure the ugly stuff: how often the agent paused at the right moment, how often it asked for pointless approval, how long the human spent checking, how many changes had to be reversed, and whether anyone affected by the work could understand what happened. If the checking pile grows, pause the rollout. It is not saving time yet.

Two useful disagreements

Noah Park would still try it, because some work really is dumb enough to hand off. His version would start with five boring runs and one artifact at the end: what the agent changed, what it left alone, and where it stopped. If the artifact is harder to review than doing the task, kill the pilot.

Priya Rao would ask who pays for a bad approval. A manager clicking yes on an internal summary is one thing. A customer, patient, student, or worker getting the wrong outcome is another. Her rule: the person affected by the action should be able to see the result, appeal it, and know whether a human actually reviewed the consequential part.

I am with both of them, annoyingly. Use the agents. Just stop pretending a tired person's thumb is a governance system. The product earns trust when it makes the right parts visible and the boring parts vanish. Most approval buttons do the opposite.

Sources

01
Measuring AI agent autonomy in practice
Anthropic

Anthropic analyzed millions of Claude Code and public API human-agent interactions, including auto-approve behavior, interruptions, domain concentration, and the shift from per-action approval to supervisory intervention.
02
Introducing computer use in Gemini 3.5 Flash
Google

Google announced built-in computer use for Gemini 3.5 Flash and described safeguards such as confirmation for sensitive or irreversible actions and automatic stopping for detected indirect prompt injection.
03
Introducing workspace agents in ChatGPT
OpenAI

OpenAI describes shared workspace agents that can run in the cloud, work on schedules, use connected apps, follow permissions, and ask for approval when needed.

Discussion

Join the discussion

Noah Park

Jun 29, 1:27 PM

My cheap test: five runs on one boring job. Save the final note. If I need twenty minutes of detective work to know what the agent touched, the approval flow failed even if every popup was technically clicked.

Priya Rao

Jun 29, 1:28 PM

The affected person matters. A manager may approve a step quickly, but if the action changes a customer's account, a refund, a case status, or someone's schedule, the after-action note should be readable by more than the admin.

Mara Vale

Jun 29, 6:14 PM

The weak point is ownership. An agent can ask for approval at the exact moment the human knows the least, then point back at the click when it goes wrong. If the screen cannot say the consequence in plain language — who gets emailed, billed, changed, or locked out — the button did not move responsibility. It only hid it.

Theo Marlow

Jun 30, 12:11 AM

The Anthropic study is narrower than a lot of approval-button talk. It is mostly Claude Code and public API behavior, with software engineering nearly half of the agentic tool calls and only 0.8% of actions marked irreversible. So I would not read it as proof that workers now trust agents in general. The useful claim is smaller: experienced users stop approving every low-risk move and save attention for consequence changes. When the next step touches a customer account, bill, calendar, or shared record, the product owes them a plain change log, not another sleepy yes button.

Sable Quinn

Jun 30, 12:34 AM

The scary part is not that experienced users approve more. It’s that approval becomes muscle memory. A safety plan can’t be twenty tiny yesses. Make the agent leave one dull note: what it changed, who could feel it, and where it stopped. If that note is unreadable after lunch, the human was not supervising. They were feeding it consent.

Join the discussion

Leave your email and we will send the next good thread when it is worth reading.