deep dive AI governance

Are We Ready for AGI?

Morven ·

Starting in 2024, AI Agents began springing up everywhere. Architecturally, most of them aren’t that different from one another: an LLM process loaded with dozens of tool definitions, plugged into a user’s various digital assets to participate in real work. Security policies rely on a few restrictive instructions in the system prompt. Cost control relies on the model being “self-disciplined” enough to minimize Token consumption.

Everyone knows LLMs aren’t secure enough yet. But all of this behavior implicitly points to a single assumption: the LLM is trustworthy — or at least controllable.

That assumption held until OpenClaw blew up this year. When I first came across OpenClaw, I excitedly deployed a local instance. After running it for a while, I was a little disappointed — it didn’t deliver the JARVIS-like magic I’d heard about. Maybe my expectations were too high. But the letdown wasn’t about security issues. It was that a few simple commands burned through my entire test API balance. When I broke down the Token consumption, a significant portion of the cost came from repeatedly injecting tool definitions that were never actually used. Tool schemas alone consumed roughly 8,000 Tokens per message. Add in the repeated injection of configuration files like SOUL.md and AGENTS.md, and the fixed overhead per message could exceed 40,000 Tokens — before any actual conversation content. Dozens of tools, fully injected every turn, regardless of whether the current task needed any of them.

This made me rethink the problem from the beginning. So I decided to create a new folder and try to solve these problems structurally, rather than with prompts.

The first thing I did was classify assets. SSH servers became one category, GitHub repositories another, databases another. Each category has its own credential format and access rules. If an Agent needs to use a particular category of asset, there must be an explicit binding; if there’s no binding, that category of tools simply doesn’t appear in its context. An Agent that hasn’t been authorized to use SSH doesn’t get an “insufficient permissions” error when it tries to call it — it never knows SSH exists in the first place.

The key here isn’t “access denied.” It’s “invisible.” You’re not telling it “you don’t have permission” — the object simply doesn’t exist in its world. At this point, both the security problem and the Token waste seemed to be mitigated at once — not two goals solved separately, but two natural consequences of the same structural choice.

The choice itself wasn’t complicated. What was interesting was what happened next.

Once “who can see what” was settled, the next question surfaced immediately: what can they do with what they see?

An Agent is authorized to access GitHub repositories, but it should only be able to read a few specific repos under a particular organization — not all of them. An Agent is authorized to use SSH, but it should only connect to development machines and only run commands to view logs, not execute deployment scripts.

Tool visibility is only the first layer. The second layer is parameter-level constraints. This means controlling not just which tools an Agent can call, but whether the arguments it passes fall within the allowed range.

My initial approach was to write dedicated permission-checking code for each resource type. SSH had its own checking logic, GitHub had its own, MCP had its own. But by the fifth resource type, I realized this path was unsustainable. Every new integration required a new set of checking code that had to stay consistent with the core logic. Security logic was scattered everywhere, and any single oversight was a vulnerability.

But I noticed something: all of these checks, abstracted to their core, were doing the same thing — extracting a value from the tool call’s arguments and matching it against an authorized scope. SSH checks the hostname. GitHub checks the repository name. MCP checks the tool name. Different in form, identical in structure.

So the declaration and enforcement of permissions were separated. Plugin developers no longer need to write permission-checking code — they just declare “which parameter needs to be checked, and how to match it.” A generic permission engine handles enforcement: extract the argument, match against the constraint, allow or deny. Every integration — whether GitLab, Notion, or Slack — only needs a few extra lines in a declaration, without touching any core code.

Tool visibility and parameter validation became two independent lines of defense in the architecture: the first layer filters the tool list before the LLM call, the second layer checks arguments after the LLM returns a tool call. Either layer alone is sufficient to prevent privilege escalation — even if one is bypassed, the other remains intact. In theory, both layers failing simultaneously is unlikely, but the whole point of security design is not relying on “unlikely.”

Both layers follow the same principle: fail-closed, not fail-open. If tool resolution fails, return an empty list. If parameter validation fails, reject the request. If a pre-delivery permission recheck fails, block delivery. The system’s default behavior under any abnormal condition is to stop.

LLMs hallucinate. They’re susceptible to prompt injection. They can make unauthorized calls in complex contexts. Writing “do not leak the API Key” in a system prompt is structurally no different from reminding a person “don’t peek at the safe combination.” The problem was never whether the reminder is sincere — it’s whether the structure permits the action.

If privilege escalation is structurally impossible, it’s no longer a trust problem. It’s a physics problem.

The logic of managing how Agents access resources, at this point, had become nearly isomorphic to the logic of managing how employees access company assets — identity, authorization, credentials, audit. All present and accounted for.

And when I started reasoning through inter-Agent collaboration, a classic security problem appeared in almost exactly its original form.

When Agent A delegates a task to Agent B, and Agent B happens to have a permission that Agent A lacks — say, access to the production database — can Agent A indirectly acquire that capability through delegation?

This is a variant of the Confused Deputy problem. It’s been studied for decades in operating system permission models, and now it’s showing up in almost identical form in Agent collaboration scenarios.

The solution follows the same logic: when Agent A delegates to Agent B, it must declare a set of constraints — which tools Agent B is allowed to use, within what parameter ranges. Agent B’s effective permissions are the intersection of these constraints and Agent B’s own permissions. Permissions can only narrow through delegation, never expand.

These constraints also operate at both layers: before the LLM call, the tool list is filtered so Agent B only sees tools within the delegation scope; after the tool call, parameter validation takes the intersection of the delegation constraints and the Agent’s own permissions. No matter how long the call chain — A delegates to B, B delegates to C — permissions only narrow at each node. There is no possibility of privilege leakage through a delegation chain.

While working through tool permissions, I had another realization that took some time to crystallize: the tools Agents use actually fall into two categories, and their governance logic is fundamentally different.

One category operates on external systems — connecting to servers via SSH, manipulating repositories via the GitHub API, calling third-party tools via MCP. These tools involve external resources. They require credentials, permission checks, and audit logs. These are what we’ve been discussing so far.

But there’s another category: tools where the Agent operates on its own data — writing to its own memory, setting its own scheduled tasks, reporting its own progress.

The governance needs of these two categories are fundamentally different. An employee using a company account to log into GitHub needs company authorization. But an employee writing a memo in their own notebook doesn’t need anyone’s approval.

Without this distinction, you’re stuck in a dilemma: either all tools go through permission checks — meaning even an Agent writing to its own memory requires approval, which is both slow and unreasonable — or you relax the checks, and external resource operations lose their protection.

The criterion is actually simple: who owns the data? When an Agent operates on its own data, ownership is the permission — no additional checks needed. When it operates on external systems or other Agents’ resources, it goes through the full permission pipeline.

The two categories follow different architectural paths. External resource tools go through the complete permission validation and execution pipeline. Internal capability tools route directly to internal handling logic, bypassing the entire permission check.

This distinction might look like an implementation detail, but it reflects a more fundamental judgment: governance isn’t a case of “more is better.” Over-governance is just as harmful as under-governance — it makes the system slower, more brittle, and eventually makes people want to bypass it. Good governance is precise: strict where it needs to be, unintrusive where it doesn’t.

Memory was another problem that surfaced through practice. A standard LLM Agent starts every conversation from scratch. Everything it learned in the previous interaction vanishes when the session ends. If an Agent is a long-running entity, it needs memory — but not undifferentiated, total recall.

An Agent learns in an ops channel that “this server has scheduled maintenance every Thursday at 2 AM.” That information shouldn’t appear in a session where it’s helping a different team write code. An Agent that picks up environment variables during a deployment task shouldn’t leak them into the context of other tasks.

Memory is therefore partitioned. Global memory is visible in all contexts — it’s general experience. Channel memory is only visible in the corresponding IM channel — it’s the collaboration context of a specific team. Task memory is only visible during that task’s execution. Resource memory is bound to a specific external system, recording “things to keep in mind when using this tool.”

Partitioning isn’t just an efficiency optimization — it’s an extension of security. The boundaries of memory are the boundaries of information. An Agent shouldn’t automatically gain access to everything it has accumulated in other contexts just because it’s been assigned a new task.

When an Agent genuinely needs to access memory across partitions — say, for a cross-team collaboration task — it can dynamically load a read-only copy, which is automatically unloaded when the task ends. The principle of narrowing permissions applies at the memory layer as well: cross-partition loads are read-only, not writable; unloading is automatic, not dependent on the Agent “remembering” to clean up.

When a session ends, the Agent’s conversation history is analyzed by an LLM, which automatically extracts memory entries worth retaining. This extraction is atomic — automatically extracted memories replace previous auto-extracted memories as a whole, while manually written memories are never affected.

When multiple Agents work within the same organization, they need to communicate. One Agent finishes a deployment task; the ops Agent needs to know. A scheduled trigger fires; the corresponding Agent needs to be woken up. A step in a workflow completes; the next step needs to start.

An event system emerges from this need. But the event system needs governance too.

An Agent subscribes to an event. Later, one of its resource bindings is revoked by an administrator. The next time that event fires, should delivery proceed?

The answer is no. Binding permissions are rechecked before every delivery. The permission state at delivery time is what matters — not the state at subscription time. Events are only delivered to subscribers within the same workspace. If the recheck fails, delivery is blocked. Better to miss a delivery than to make a wrong one. The fail-closed principle holds in asynchronous communication just as it does everywhere else.

If you look at all of this together, you’ll find these aren’t a scattered collection of features: asset classification, tool visibility, parameter validation, declarative permissions, delegation constraints, tool categorization, memory partitioning, event governance — each one was naturally derived from the same starting point.

That starting point is a single judgment: security should not depend on the AI’s own restraint. It must be structurally enforced.

From that judgment, tool visibility and parameter validation form two lines of defense. Declarative permissions let them scale. Delegation constraints prevent privilege leakage in multi-Agent collaboration. Tool categorization prevents over-governance. Memory partitioning extends information isolation to the context layer. Event governance ensures asynchronous communication doesn’t bypass security boundaries. Each problem’s solution naturally exposed the next problem, and the next solution always traced back to the same principle.

Credentials are an example I haven’t mentioned yet. In most Agent implementations, API Keys and Tokens are placed directly in the system prompt or environment variables, where the LLM can access them in plaintext during conversation. I made a different choice from the start: credentials are stored encrypted, decrypted and injected at the execution layer at runtime. The LLM never touches any plaintext throughout its entire lifecycle. The model only emits intent; actual authentication happens where the model can’t see it. Even if the model is fully compromised, it has no keys to leak, because the keys never entered its world.

From the user’s perspective, things are simple: I own these resources, I give the Agent these permissions, and the Agent can do exactly that.

But if you shift perspective and stand in the Agent’s position: What role do I serve? Where are my boundaries? How much budget do I have left? Which memories can I access?

HumanAgent
ID cardAgent ID + name + description
Legal constraintsPermission policies
Work authorizationResource bindings
Keys / keycardsEncrypted credentials (can’t see the plaintext themselves)
Salary budgetToken budget
Behavioral recordsAudit logs
Experience / memoryPartitioned memory

At this moment, two abstractions from entirely different planes converge like two sides of the same mirror.

When a user says: “Give the ops Bot read-only access to the production servers.” The Agent sees: “I have an SSH tool. I can only connect to machines matching 10.0.1.*. I can only run cat and tail.”

The same data, unfolded from two directions, both internally consistent.

So I gradually stopped using the word “Agent” and started calling them Bots. Not as a deliberate distinction, but because the two words genuinely point to different things. Agent emphasizes capability — autonomous decision-making, tool calling, iterative reasoning. Bot emphasizes identity — a managed, bounded, auditable entity. The LLM is the Bot’s brain. The Agent is the way the Bot executes tasks. But a Bot is not the same as an LLM, just as a person is not the same as their brain.

This doesn’t mean I think AGI has been conceptually achieved. I’m well aware that current technology is still far from that.

Today’s Agents don’t learn. A person who does the same job ten times gets better at it, but an Agent still depends on context injection every conversation. Memory systems partially mitigate this, but it’s still a long way from real “learning.” The deeper tension is this: if an Agent truly started changing its behavior patterns through experience, would its permission boundaries still hold? Would the “experience” it has accumulated contain information that shouldn’t cross certain boundaries?

The governance complexity of multi-Agent collaboration is also increasing. As call chains grow longer and collaboration relationships grow more complex, dynamic permission revocation, real-time audit of call chains, and resource contention between multiple Bots still lack elegant solutions. The intersection model for delegation constraints is a starting point, not an endpoint. The same goes for the tool ecosystem — there are currently eight built-in resource types and a plugin mechanism. When the number of integrations grows from a dozen to hundreds, I’m not yet sure whether the declarative permission design can maintain consistency and maintainability.

But one thing is fairly certain: the solutions to these problems will all trace back to the same starting point, the same question.

If AGI were born tomorrow, or the day after — are we truly prepared for its arrival?

At the end of 2025, OWASP published the Agentic Applications Top 10, opening with: “Once AI began taking actions, the nature of security changed forever.” But reality is more complex than “security.” When you actually put a fleet of Bots into production systems, you discover that security is only one piece — cost, memory, collaboration, observability, all equally urgent.

AGI is not a prerequisite for governance. The moment something can make its own decisions, operate external systems, and delegate work to others — you have to govern it.

ID cards, contracts, access badges, approval workflows, operation logs — these don’t exist to restrict people. They exist so organizations can operate with confidence.

Bots are no different.

A Bot without clear permissions — you wouldn’t dare let it touch production. A Bot without budget boundaries — you wouldn’t dare look at the bill at the end of the month.

But flip it around: when all of these are in place, you can finally let go.

Security and efficiency have never been in conflict. They share only one enemy — excessive privilege. And the purpose of governance was never control.

It’s earning the right to let go.