Know Your Agent 🔭 Before an Examiner 🔬Does (pt. 1)

Something shifted in a packed room at Deloitte’s San Francisco office last night. A panel on agentic AI in financial services surfaced a tension that most compliance frameworks still haven’t absorbed: the accountability question isn’t theoretical anymore. Autonomous AI systems are already approving loans, collecting debts, and executing transactions. Architecture decisions being made right now will determine which institutions survive their first regulatory exam involving an agent.

The panel brought together voices rarely in the same room. Christina Tetreault, Deputy Commissioner at California Department of Financial Protection and Innovation. Philip Rathle, CTO of Neo4j. Beena Ammanath, leader of Deloitte’s Global AI Institute. Janak Sevak from Anthropic’s Applied AI team. Hosted by Erika Bahr, Founder & CEO of Daxe, alongside Deloitte and the MLOps community, the conversation went somewhere most conferences avoid.

Accountability Collapse

When asked who bears responsibility when a regulated institution’s AI agent makes a bad decision, every panelist gave a different answer. Model provider? Integrator? Bank? Compliance officer? Chief risk officer?

Here is what made the exchange remarkable. Nobody said “nobody,” but nobody said the same somebody either.

Philip Rathle offered the sharpest framing:

“In the consumer’s mind, whoever was interacting with the bank, it’s a hundred percent the bank.” Whether technical liability traces back to a third-party vendor is irrelevant because, as Rathle put it, “they’re gonna bear the reputational front end.”

Always.

Beena Ammanath grounded it in existing corporate structure:

“It’s either the chief risk officer or compliance officer. What’s happening in very established industries like banks is that there are already governance models.”

Not perfect for the agentic world, she noted, but checks and balances already exist that institutions are adapting. She pointed to an emerging role gaining traction: the chief AI risk officer, someone focused specifically on AI-related risks rather than inheriting them as a side responsibility.

Christina Tetreault reinforced this with something Fortune 1000 boards should note carefully. When examiners walk into a bank, third-party oversight is already a core part of the examination process. Most agentic AI deployments are, by definition, third-party service provider relationships.

“This could be potentially their license, depending on how great the mistake was,” Tetreault said.

Gap between where accountability technically lives and where it lands reputationally is the real design problem. No governance framework built for static software handles it.

Auditability Is Not Logging

Janak Sevak made a distinction that should change how technical teams think about compliance infrastructure. When asked to choose one non-negotiable requirement for AI in financial systems, auditability, reversibility, or compliance, his answer was immediate: auditability. But with a caveat that redefines the term.

“Auditability is not just about logging like this happened,” Sevak said. “It’s more about how did it happen, how did we get to the decision, how did they make the decision.”

Then came the analogy that crystallized the panel’s sharpest insight:

“If there is an explosion happening, if you can trace the source and the principles, that helps.”

This reframes the entire audit trail conversation. Most compliance architectures log outputs. Emerging standard requires logging the reasoning topology. That includes which context was retrieved, which rules fired, which access controls filtered the data, and whether a human checkpoint existed in the chain.

Sevak pushed further on what this looks like operationally: “Any auditable decision that you are making as a group, make sure that your humans in your organization are able to explain it within 24 hours.” Any decision made by the system should be traceable back to a human-comprehensible explanation on that timeline. Not eventually. Not after a forensic review. Within a day.

Philip Rathle added a critical architectural layer. Two forms of decision capture are needed simultaneously. One is the immutable audit log, an unmodifiable record for examiners. But there’s a second requirement most institutions miss: “You also want to capture that same information in a form that can be a feedback loop so that future decisions can get better.” These serve different masters 🔍 but both require the same underlying architecture: a system that represents relationships, causality, and context natively rather than flattening everything into rows and columns.

Synthetic Audience Panorama

Determinism Gradient

Perhaps the most operationally useful insight from the panel was the recognition that not all AI-driven financial actions require the same level of deterministic control.

Rathle drew the line with characteristic precision. A product recommendation that gets it wrong? “The customer won’t buy your product. That’s not regulated in the same way that loan approval” is, or KYC, or moving money from the right accounts. These are fundamentally different accountability surfaces. Architecture must reflect that gradient.

Emerging pattern among top-10 US banks (most of whom, Rathle noted, are already Neo4j customers) involves splitting the problem. Fuzzy decisions where probabilistic output is acceptable get enriched context from a knowledge graph. Deterministic decisions where 100% accuracy and full auditability are required get converted into structured queries that execute through a rules engine, with the LLM generating the query rather than the answer.

“Models can enable lots of new capabilities, and they can democratize existing capabilities,” Rathle said. But the bar financial services companies are held to “can’t be met with just one kind of technology, particularly when it’s probabilistic.”

This is not the same as putting a human in the loop. This is designing the loop itself to have zones of determinism and zones of probability, with the boundary between them as an explicit 🏗️ architectural decision rather than an afterthought.

Regulation Is Not Coming. It Is Here.

Christina Tetreault said something that should stop every compliance officer mid-sentence:

“AI is not unregulated. Agents are not unregulated. There are already all these laws that apply.”

Unfair, deceptive, and abusive acts and practices standards apply. Product-specific regulations apply. Licensing requirements apply. Tetreault was direct:

“If you walk out of here with nothing else, please understand that California wants you building your companies here. We want you serving California consumers, and we want to be at the forefront.”

Then she made it concrete with a use case that draws the regulatory line precisely. One California financial institution is using an AI agent to help customers fill out loan applications. Tetreault’s team examined it. “We were like, wow, that’s really cool.” But the critical architectural detail: “The decisioning is not with the agent.” It assists the workflow but does not touch the credit decision itself. That line, between facilitating a process and making the decision within it, is where most regulatory risk concentrates.

Practical implication is stark. Companies building AI systems that initiate, approve, or execute financial actions may already be conducting licensable activity.

“You do not want to be on the wrong side of that line when the finance regulator decides that what you’re doing requires a license, if you’ve been doing it without a license,” Tetreault warned.

Her department runs open office hours weekly. Message was unambiguous: come explain what the system does before an examiner discovers it during a routine review.

Know Your Agent

Most forward-looking concept from the evening emerged when Janak Sevak connected identity infrastructure to agent governance. “Your agents should have a badge. They should have an agent ID or employee ID,” he said. “Know your agents like you know your customers.”

If the future workplace includes agents as co-workers, those agents need identity within access control systems. Identity layer is not an afterthought to agentic commerce; it is the precondition for every audit trail, every access control filter, and every regulatory examination that follows.

Philip Rathle mapped how this plays out architecturally. Agent permissioning creates “an explosive mess” when handled through traditional systems, with each agent needing access controls at varying levels across nested hierarchies of services, folders, and user groups. Graph-based identity and permissioning systems already handle this at scale for human actors in financial services. Extending them to agent actors is the natural move, but it requires treating agent identity as a first-class design concern rather than a metadata tag.

Christina Tetreault made this tangible with a specific frontier case: debt collection. “We are trying to figure out where the line is between what a human would do when they call you on the phone and ask you to pay a bill that’s behind, and what having an agent do that means.” She didn’t claim to have the answer. But she confirmed agents becoming licensed is not speculative: “Yes, I can imagine this.”

Data Advantage Banks Don’t Realize They Have

Beena Ammanath made an observation that inverts the common narrative of banks as innovation laggards. “The big advantage that exists in the financial sector is massive amounts of good data,” she said. Decades of compliance requirements have forced financial institutions to maintain high-quality, well-structured data reserves. Manufacturing, by comparison, spent years struggling with data quality before meaningful models were possible.

“Thanks to compliance that has existed for decades,” Ammanath noted, banks sit on an extraordinary asset for the agentic era. What she’s hearing from banking clients has shifted: “Help us set the guardrails so that we can innovate faster.” Compliance DNA, once perceived as the brake, is becoming the accelerator.

Institutions pulling ahead are bringing governance conversations into the ideation phase, baking risk considerations into vendor contracts from day one, and treating guardrail design as an innovation 🏦 catalyst rather than friction. “I’m actually seeing more and more focus on building those guardrails early on, bringing in the governance conversation much earlier, even at the ideation phase,” Ammanath said.

Culture of compliance, reframed, becomes competitive advantage. Institutions that understand this will set the standard. Those treating governance as friction will spend the next decade retrofitting systems that should have been designed correctly from the start.

Open Question

Panel surfaced a tension it did not fully resolve, and perhaps cannot yet. AI systems are probabilistic by nature. Regulated financial outcomes demand deterministic accountability. Entire emerging architecture of agentic commerce in financial services exists in the space between those two facts.

Every institution deploying autonomous AI in financial workflows faces this design question: where, precisely, does the system transition from probabilistic assistance to deterministic action? And who signed off on drawing that line there?

Institutions answering that question well are not the ones with the most advanced models. They are the ones with the most rigorous architecture for constraining what those models can do.

That distinction, between capability and constraint, may be the defining design challenge of financial AI for the next decade.

Panel hosted by Deloitte, Daxe, and MLOps Community in San Francisco. Panelists: Christina Tetreault (California DFPI), Beena Ammanath (Deloitte Global AI Institute), Philip Rathle (CTO, Neo4j), Janak Sevak (Anthropic). Moderated by Erika Bahr (CEO, Daxe). MLOps Community remarks by Rahul Parundekar.