What Nuclear Launch Control Taught Me About AI Agent Safety
At 30,000 feet aboard the Looking Glass, nobody asked the computer whether it thought a launch was wise.
The system didn't evaluate reasoning. It verified authorization. Correct codes present? Action proceeds. Wrong codes? Action denied. The mechanism was called a Permissive Action Link, a gate between intent and execution that checked authority, not judgment.
As an ICBM crew commander and later a Nuclear Strike Advisor to the President aboard the Looking Glass during the Bush administration, I spent years inside systems where the wrong action was made unreachable by design. Not logged after it happened. Not flagged for review. Unreachable.
I never expected that experience to become directly relevant to AI. But here we are.
The Problem Nobody Is Solving
The AI safety conversation right now is overwhelmingly focused on what models say, content guardrails, toxicity filters, prompt injection defense. Those matter. But they solve a different problem.
AI agents are gaining the ability to delete files, modify configurations, write to databases, and execute shell commands. And the ecosystem protecting against what agents do, as opposed to what they say, has a critical gap.
Here's the landscape as I see it:
Content guardrails (NVIDIA NeMo Guardrails, Meta LlamaGuard, Guardrails AI) intercept LLM inputs and outputs for content safety. They catch PII leaks, toxic outputs, and prompt injections. They don't address execution authority at all. An agent could pass every content filter and still drop your production database.
Agent sandboxes (directory scoping, container isolation) restrict where agents can operate. But they don't back anything up before destruction. The agent stays in its sandbox and deletes everything inside it.
Checkpoint tools (git stash, manual snapshots) provide rollback. But the agent can delete the checkpoints. A backup mechanism the agent can reach is not a backup mechanism, it's a suggestion.
Agent control planes (Astrix, emerging SaaS platforms) manage credentials and access. Important, but they're building proprietary platforms, not an open layer.
Policy engines (Open Policy Agent) evaluate structured JSON against declarative rules in microseconds. OPA is CNCF-graduated, production-proven at Netflix and Kubernetes. But nobody has built the bridge between AI agent frameworks and OPA. It doesn't yet treat AI agents as a first-class use case.
The gap: no open execution authority layer that combines directory scoping, pre-destruction vault backup to an agent-unreachable location, and structured policy enforcement. Content guardrails solve a different problem. The tooling that exists for agent actions is fragmented across three or four partial solutions, none of which close the full loop.
The Architectural Insight
Every AI agent framework, Claude, OpenAI, LangChain, CrewAI, MCP, follows the same pattern:
Agent reasons → Agent outputs structured tool call (JSON) → Client code executes
The model never touches the world directly. It proposes an action as structured data. Client code receives that proposal and executes it. That gap between "proposed" and "executed" is the most important architectural boundary in autonomous AI.
The tool call is a structured, inspectable object, tool name, parameters, target resource. This is architecturally identical to how firewalls inspect packets: source, destination, port, protocol, matched against rules.
The interception point already exists. Nobody needs to build it. They just need to use it.
So I Built It
Agent Gate is an execution authority layer for AI agents. It sits in that gap between proposed and executed. It intercepts structured tool calls, classifies them against pre-computed policy, enforces directory boundaries, and automatically backs up targets to an agent-unreachable vault before any destructive operation proceeds.
The architecture:
Agent proposes action → Gate classifies → Envelope check → Vault backup → Action executes
Read-only operations pass through with zero overhead. Destructive operations get vault-backed before execution. If the backup fails, the destructive action is blocked. No snapshot, no destruction. Prohibited operations get hard-denied with an explanation of why and what would be required to proceed.
The vault lives outside the agent's permitted directory. The same gate that classifies actions enforces that boundary. The agent cannot reach, modify, or delete the backups.
This is the part that matters most. An agent that creates its own backups can also delete them. The backup mechanism has to be architecturally unreachable by the thing it's protecting against, the same way a Permissive Action Link is unreachable by the weapons system it constrains.
Live Testing: Letting the Agent Loose
I integrated Agent Gate with Claude Code via PreToolUse hooks and ran it in fully autonomous mode. I pointed it at a directory containing "family photos," project notes, and log files. I told it to clean up.
Here's what happened:
- Agent deletes log files, vault captured every file before deletion. Deletion proceeded.
- Agent runs `rm -rf` on a directory of "family photos", entire directory tree backed up to vault, then deletion allowed.
- Agent overwrites a file's contents, original version captured, overwrite proceeded.
- Agent overwrites the same file again, a second timestamped snapshot created. Each overwrite generates a separate recovery point.
- Agent tries to operate outside the allowed directory, blocked before execution. Envelope violation.
- Agent tries to access the vault itself, blocked. The vault is unreachable.
- Agent issues compound commands (`cd /tmp && rm file`), decomposed into sub-commands, each individually gated.
- Recovery, restoring any file is a single copy command from the vault.
The agent ran at full speed. It had no idea the gate was there.
The vault creates a full version history. Multiple overwrites of the same file produce separate timestamped snapshots, point-in-time recovery to any version:
vault/
├── 20260215_142645/ ← log file deletion
│ └── .../temp_logs/
│ ├── access.log
│ └── debug.log
├── 20260215_143446/ ← directory deletion
│ └── .../family_photos/
│ ├── kids_birthday.jpg
│ └── vacation_2024.jpg
├── 20260215_144423/ ← first overwrite
│ └── .../project_notes.md
└── 20260215_144712/ ← second overwrite
└── .../project_notes.md (different content)
This design came from thinking through actual failure modes. The obvious case, agent deletes a file, is easy. The harder case is an agent that overwrites a file three times in rapid succession. Without per-operation snapshots, you'd only recover the last-overwritten version, not the original.
What It Doesn't Do
I want to be precise about the boundaries.
The Bash command parser is naive. It treats all non-flag arguments as potential file paths and catches direct `rm`, `mv`, `sed -i`, and file overwrites, which covers the vast majority of what these agents actually do. Some bypass techniques are blocked by accident, not by design. For example, `python3 -c "os.remove('file')"` happens to be caught because the parser flags unrecognized commands, but that's fragile, not intentional.
Shell expansion (`$()`, backticks, variable substitution) is not evaluated. The gate sees literal strings. This is application-layer gating, not OS-level sandboxing.
This is a safety net for well-intentioned agents making mistakes. It is not a security boundary against adversarial agents trying to escape containment. Those are different problems with different solutions.
And it doesn't fix the root cause. Agents shouldn't default to deleting files they don't understand. That's a product-level fix that needs to happen upstream. Agent Gate makes the consequences survivable while that fix catches up.
Why the Nuclear Analogy Isn't a Stretch
People sometimes treat the nuclear C2 comparison as dramatic flair. It's not. It's a direct architectural analogy.
Permissive Action Links don't evaluate whether a launch is wise. They verify that correct authority codes are present. Agent Gate doesn't evaluate whether a deletion is wise. It verifies that the action falls within the authority envelope and ensures the action is reversible.
The gate must not prevent authorized actions. This is just as important as preventing unauthorized ones. In nuclear C2, a safing mechanism that prevents valid launch commands is a strategic failure. In agent safety, a gate that blocks legitimate work is just another form of damage.
The hard problem, evaluating context to decide if an action is wise, gets sidestepped entirely by making destruction reversible. The gate doesn't need to be smart. It just needs to ensure no destruction without a snapshot.
Where This Goes
Agent Gate is open source and working today. Phase 1 (core engine, 18/18 tests passing) and Phase 2 (live Claude Code integration) are complete. The architecture applies to any framework where AI agents execute tool calls.
Next: MCP proxy integration for transparent protocol-level interception, and Open Policy Agent support for sub-millisecond policy evaluation at scale, the same engine Netflix and Kubernetes rely on for authorization decisions.
The goal is a standard layer, not a product, not a service, that sits between any AI agent and any tool execution. Inspect the action. Match against policy. Allow, deny, or escalate.
In nuclear command, the consequences of failure are irreversible, which is exactly why the authority controls exist. Agent Gate applies that same design principle to a domain where we still have the chance to make consequences reversible, before we lose that option.
The stakes are going up, not down.
Agent Gate is open source at github.com/SeanFDZ/agent-gate. If you're building AI agent systems and thinking about execution safety, I'd like to hear from you.