Governance Compliance in the Age of Agentic AI Engineering

Governance Compliance in the Age of Agentic AI Engineering

Most conversations about AI in software development tend to center on what these systems can do.

How many lines of code can they write in a day? How quickly can they resolve tickets? How much faster can a sprint move when agents handle the routine work? These are all practical considerations worth tackling.

However, what gets less attention in the process is the question that follows every one of those gains: who is responsible when something goes wrong?

With traditional software, that question has a clear answer. A developer writes code, a reviewer checks it, a manager approves the deployment. A human is present at every decision point, and accountability traces back through the chain without much ambiguity.

Agentic engineering disrupts that chain. When an AI agent reads a codebase, writes changes across multiple files, runs its own tests, and opens a pull request without a human directing each step, accountability becomes harder to follow.

The agent acted on a goal someone set, but the specific decisions it made along the way belong to no one in particular.

This gap presents a compliance problem, and for many organizations it remains an open question. We’ll see more in this piece.


The Accountability Gap Is the Real Problem

Industry coverage tends to treat AI governance as a bundle of separate problems: risk, compliance, security, audit readiness.

All of them matter, but they are downstream of accountability. Without a clear answer to who owns the outcome, every other governance question becomes harder to resolve.

McKinsey partner Rich Isenberg captured the underlying shift here in this piece:

“Agency isn’t a feature. It’s a transfer of decision rights. The question shifts from ‘Is the model accurate?’ to ‘Who’s accountable when the system acts?'”

— Rich Isenberg, McKinsey

The frameworks most organizations rely on were written on the assumption that the answer is a specific, identifiable person. When that assumption holds, the frameworks function as intended. When it doesn’t, the gaps begin to show.

The cost of the gap is measurable. McKinsey’s 2026 AI Trust Maturity Survey of roughly 500 organizations found that only about a third report maturity levels of three or higher in strategy, governance, and agentic AI governance.

McKinsey 2026 AI Trust Maturity Survey bar chart showing responsible AI maturity scores across strategy, risk management, data and technology, governance, and AI agent governance

Source: McKinsey

Within that data, organizations with explicit ownership for responsible AI score an average maturity of 2.6, compared with 1.8 for those without it.

Another paper drawing on industry surveys put the picture more starkly: only 21% of enterprises have mature governance models for autonomous agents, and 40% of agentic AI projects are projected to fail by 2027 because of inadequate governance and risk controls.

The harder problem underneath all of this is that governance work tends to get done after the fact, once a system is already in production and an incident has surfaced the gap.

By then, retrofitting accountability is more expensive than building it in at the start.


The Compliance Cost of Autonomous Action

The cleanest way to think about this is that automation does not transfer liability, it concentrates it.

The more decisions an agent makes on behalf of an organization, the more those decisions accumulate against the organization that deployed it.

For example, Spain’s data protection authority treats autonomy as a design choice the controller makes, not a fixed property of the technology. It maps that choice across four levels:

  • The agent proposes, the human operates
  • The agent and the human collaborate
  • The agent operates, the human is consulted or approves
  • The agent operates, the human observes

Whichever level a team picks for a given task has to be justified, evidence-based, and documented, calibrated to context and risk.

The data controller remains fully responsible regardless of where on that spectrum the agent sits, and the AEPD names the failure mode teams fall into when they forget that, calling it the temptation to “shift all responsibility to the user or human supervision.”

A reviewer in the loop has clearly assigned responsibilities, but those limits cannot replace the controller’s own diligence in designing the system. Designing a poorly governed agent and leaning on the reviewer when something breaks is not a defensible position.

The same guidance offers a minimum threshold engineering teams can actually apply, called the Rule of 2.

AEPD Rule of 2 Venn diagram illustrating the three conditions that create unacceptable agentic AI risk: automatically processing uncontrolled information, accessing sensitive data, and performing automatic actions

Source: AEPD

The rule says: an agent should never simultaneously process untrusted inputs, access sensitive data, and take automated actions without human supervision. Any two of those are workable. All three together produce a configuration that should not be allowed at all.

For anyone scoping an agent that touches a codebase, a ticketing system, or customer data, that rule cuts off many architectures before you even start building.

The same logic shows up in consumer protection.

The UK CMA has been clear that businesses are responsible for how they engage with consumers, whether through people or AI systems, and that consumer law cares about effects rather than mechanism.

An agent that misleads or pressures a consumer in ways that harm their economic interests is likely unlawful, and the novelty of the technology does not change that.

The same theme runs through the EU AI Act, which classifies AI systems used in things like recruitment, credit decisions, or critical infrastructure as high-risk, which catches agents deployed in those contexts.

The provider then carries concrete obligations: documented risk management, technical documentation, event logging, instructions for downstream deployers, and design choices that enable human oversight. None of these are optional, and none of them are easy to bolt on after deployment.

The thread running through all of this is simple. Regulators are not trying to work out which decision the agent made. They are trying to work out who decided to put an agent in a position to make decisions in the first place, and that answer is always the organization.

Documented risk assessment, traceable decisions, meaningful human oversight, and the ability to explain what the system did and why are the evidence every framework expects.

SOC 2 in an Agentic Engineering Environment

For many technology companies, particularly those selling software to enterprise businesses, SOC 2 is a commercial requirement rather than a nice-to-have. Customers expect it before signing contracts, and procurement teams rely on it before granting access.

The framework is built on five Trust Services Criteria: security, availability, processing integrity, confidentiality, and privacy. A SOC 2 Type II report, which evaluates how well an organization’s controls operate over time, signals that security, availability, and data protection are not only defined but independently verified.

The absence of a clearly attributable human request is treated as a gap as noted by Teleport. If agent activity is logged under shared accounts or without enough detail, the evidence required for SOC 2 is missing.

This becomes more acute because SOC 2 change management expects changes to be authorized, documented, tested, and approved before deployment.

Agent-generated code that executes at runtime without prior human authorization can bypass every one of those stages, and retrofitting the framework onto agentic behavior requires intentional design rather than just documentation.


What Building Governance Into Agentic Engineering Looks Like

Define Ownership Before Deployment

Every agentic system needs a designated human owner, not as a formality but as the person who can answer the basic accountability questions: what does this agent do, what access does it have, who approved the change, what is the escalation path when it behaves unexpectedly.

If ownership is not assigned explicitly, those questions do not have clear answers, and the accountability gap gets baked into the system from the start.

Treat Agents as Identities, Not as Tools

Agents that act on production systems need to be treated with the same rigor as human identities. This means assigning unique identities, limiting permissions, conducting regular access reviews, and being able to revoke access when needed.

The category often referred to as non-human identities is growing rapidly across enterprise environments. Palo Alto’s analysis puts the average enterprise machine-to-human identity ratio at 82:1.

Without this structure, agents are granted access without the level of oversight applied to people, even though they can act with comparable or greater operational impact at machine speed.

Build the Audit Trail Into the Architecture

Governance cannot depend on manual documentation when agents operate at the speed they do. Asking teams to record what an agent did after the fact introduces gaps, inconsistencies, and convenient omissions.

Logging needs to happen automatically, capturing enough context to reconstruct what the agent did, why, and under what authority, without requiring human annotation.


Concluding Thoughts

The organizations that get this right are the ones that treat governance as part of the engineering work from the beginning, alongside the agentic adoption itself rather than as a compliance exercise to complete afterwards.

GAP’s AI Acceleration Workshops give engineering and executive teams a structured framework for identifying:

  • where accountability gaps exist in their current agentic deployments
  • what regulatory obligations apply to their specific use cases
  • what governance infrastructure needs to be in place before adoption scales further

For organizations that need a more structured path from proof of concept to compliant production deployment, Validate:AI provides a framework designed specifically for that transition.

The window to build this infrastructure before it becomes urgently necessary is narrowing. If you want to understand where your organization currently sits, reach out to a GAP expert.

About Gap
Overview
Services
Services
Industries
Insights
Insights