Your model-risk committee doesn't want a model. They want a register entry.

The AI pilot is on the agenda for the 2:00 PM model risk committee. It has been on the agenda for three months. Each time, it gets deferred. The product team thinks MRM is blocking for sport. The MRM team thinks the product team is ignoring the control framework that every other model in the bank has to meet.

Neither is right. The real issue is that the AI pilot is showing up at the committee as an AI pilot, not as a register entry. And the committee does not know what to do with it.

What SR 11-7 actually says

SR 11-7 was published by the Federal Reserve in 2011 and has been the governing guidance for model risk management at US banks for well over a decade. It defines a model as a "quantitative method, system, or approach that applies statistical, economic, financial, or mathematical theories, techniques, and assumptions to process input data into quantitative estimates." The definition is deliberately broad. It was written to cover everything from a credit scoring model to a Monte Carlo simulation to a regulatory capital calculator.

The guidance then sets the framework. Every model in production gets:

An inventory entry in a central register
A documented validation covering conceptual soundness, ongoing monitoring, and outcomes analysis
A challenger model or alternative benchmark
An assigned owner, a validator independent from the developer, and a sign-off from the model risk committee
A change control process that triggers re-validation on material changes

None of this is controversial at a bank that has been running model risk for twenty years. The MRM function has a register, a validation template, a monitoring cadence, and a quarterly committee meeting. It operates. The question is not whether the framework exists. The question is how a large language model fits inside it.

Why LLMs break the existing process

An LLM breaks every assumption the MRM framework was built on. Consider what the validation team expects:

The model has a fixed specification. The LLM does not. The weights shipped with gpt-4 in March 2023 are not the weights shipped with gpt-4 in January 2024. The provider updates the underlying checkpoint without announcing it as a material change. The validator signed off on one model and the production system is now calling a different one.

The model takes structured input. The LLM takes natural language. The validator has no way to define the input domain, which means the validation cannot characterize the input space, which means the ongoing monitoring plan cannot detect drift.

The model returns a defined output. The LLM returns generated text. Output monitoring for accuracy is something MRM has done for fifty years. Output monitoring for "is this response acceptable" is something that does not exist in the MRM playbook.

The model has a single deployed version. The LLM is called from an application that composes a prompt, a retrieval result, a tool schema, and a system policy. Change any of those and the effective model behavior changes. The validator signed off on a specification that no longer exists by the first week of production.

This is why the MRM committee defers. Not because they oppose AI. Because the artifact in front of them does not match any shape the framework knows how to evaluate.

What the committee actually needs

Talk to an MRM lead and they will tell you what they want:

They want a register entry that maps to a stable identifier. If the product team says "we use gpt-4," the register entry is useless a week later. The register needs to point at an internal model ID that the bank controls, not an external version string the provider owns.

They want a validation document that the validator can actually sign. If the validation cannot be written because the model keeps shifting, the validator cannot sign. The bank needs to fix the model shift, not ask the validator to sign a document that will be outdated by Tuesday.

They want a monitoring plan the operations team can execute. "Monitor the LLM" is not a plan. A plan names the metric, the threshold, the data source, the owner, and the escalation path.

They want a challenger. If the primary model is gpt-4 routed through OpenAI, the challenger should not be "also gpt-4 but a different prompt." It should be a structurally different model that will produce different outputs on the same inputs, so the validator has something to benchmark against.

They want change control that triggers on material events. A provider checkpoint update is a material event. A prompt template change is a material event. A tool schema change is a material event. The committee wants to know when these happen, in advance, with a documented impact assessment.

None of this is hostile to AI. All of it is the MRM framework being honest about what the existing process requires. The product team hears "the committee is blocking" when what the committee is saying is "the artifact you brought does not fit the shape."

Register every LLM endpoint as an inventoried model

The pattern that works is to stop treating LLMs as a single thing and start treating each endpoint as an inventoried model with a stable internal identifier. The bank controls the identifier. The product team calls the identifier. The gateway resolves the identifier to an actual provider-and-version under the hood.

Four tiers of registration:

Primary production. The model the business calls in live workflows. Validated. Signed off by the committee. Monitored on the documented plan. When the provider checkpoint shifts, the gateway surfaces the event and the MRM team triggers re-validation before the new checkpoint serves a single production request.
Approved challenger. Structurally different model from primary. Traffic is shadowed at a configurable percentage. The gateway captures paired outputs on the same inputs for the MRM team to analyze divergence. When the challenger is promoted, the promotion runs through the committee just like the primary did.
Sandbox. Available to data science teams for prototyping. Not callable from any production surface. RBAC at the gateway layer enforces the separation. Models here do not appear in the register and do not require validation because they are not in production.
Deprecated. Retired from production. Calls return a structured error that applications handle gracefully. The model stays in the register as an audit artifact, with the date of retirement and the successor model documented.

Every call to the gateway writes an event to the MRM register. The event includes the logical model ID the application requested, the actual provider and version served, the policy applied, the tokens in and out, the caller identity, and the latency. The MRM team queries the register. The register is not a document the product team emails over quarterly. It is a live data source that the MRM dashboard reads from continuously.

What changes operationally

Three things change on the first day this pattern goes live.

The committee stops deferring. The artifact in front of them is no longer "an AI pilot." It is a register entry. It has a validation document. It has a challenger. It has a monitoring plan. The committee votes on it the same way they vote on every other model in production.

The validator stops playing catch-up. When the provider issues a checkpoint update, the gateway sees it before production traffic does. The event fires. The validator has a documented impact assessment window before the new checkpoint serves a single user. The validation stays current.

The product team stops fighting the process. The team calls the internal model ID. The gateway handles routing. If the gateway says "budget exceeded" or "this caller is not cleared for primary production," the application handles the structured error. The team is not shipping against a moving provider API. They are shipping against a stable internal contract.

This is not an AI problem

It is worth saying plainly: the MRM committee is not blocking AI because AI is dangerous. They are blocking AI because the artifact they are asked to sign does not map to the framework they operate under. The framework is not wrong. It is the control-plane that prevents a bank from blowing up on a bad model decision.

The work is to shape the AI deployment so it produces register-shaped artifacts by default. That is not a philosophical question. It is a platform architecture question. The platform either emits the artifacts the MRM team needs or it does not. If it does, the committee signs. If it does not, the committee defers.

When a bank's product team and the MRM team are fighting, the fix is almost never in the relationship. The fix is in the architecture.

Next step

If you are at a US bank and the 2:00 PM committee meeting is getting tense, an architecture review will map your current AI deployment against the four-tier model and produce a findings document your MRM team can carry into the next session. Ninety minutes with you, one week to the written output, vendor-agnostic whether or not you end up using Nuviax.

Book an architecture review →