The problem
A US bank running AI in production does not have a model problem. It has a register problem. SR 11-7 does not care whether the model came from OpenAI, Anthropic, or a fine-tuned Llama running inside your VPC. It cares that every model in production has an inventory entry, a validation document, an ongoing monitoring plan, and a challenger. It cares that when a provider silently updates gpt-4 to a new checkpoint, the bank notices and logs it as a material model change.
Most AI gateways ignore this entirely. They load-balance across providers for latency and cost. That is not a gateway for a regulated bank. That is a stateless proxy with no audit story.
Why the usual approach breaks
Engineering teams wire up LangChain or LiteLLM, point it at five providers, and ship. The compliance team finds out six months later when a model risk audit asks "which model is answering the customer's question?" and the answer is "it depends on the hour."
The gap between what the application thinks it is calling and what actually responds is invisible to the MRM function. Challenger tracking is impossible because there is no notion of a primary versus a challenger at the routing layer. Change control cannot work because the model endpoint is a string in a config file, not a versioned artifact under governance.
How AI Gateway closes the gap
AI Gateway treats every provider endpoint as an inventoried model. Each registered endpoint gets a stable internal model ID, a tier classification (primary, challenger, deprecated), and a policy binding. The application calls the internal model ID, not the raw provider URL. The gateway resolves the routing decision, logs the actual provider and model version served, and emits an event to the MRM register automatically.
When a provider announces a checkpoint update, the gateway surfaces the change as a material model event before the first production request hits the new weights. When a team wants to try a challenger, they register it as a challenger endpoint and the gateway mirrors a configurable percentage of traffic to it without touching application code. When a model is retired, the gateway returns a structured error the caller can handle gracefully rather than a 404 from a dead endpoint.
Implementation pattern
The bank registers four tiers: primary production, approved challenger, sandbox-only, and deprecated. Every endpoint lives in one tier. RBAC binds applications to tiers. A consumer-facing app can only call primary production. A data-science notebook can call sandbox. A model-validation run can call primary and challenger side by side and capture the divergence. The MRM register ingests the gateway's daily event stream and auto-populates the monitoring dashboard. Nothing is manual. Nothing is invisible.
Next step
An architecture review walks your current provider setup through the four-tier model and produces a findings document your MRM team can carry into the next committee meeting. Ninety minutes with you, one week to the written output, vendor-agnostic whether or not you end up using AI Gateway.
Next step
Map Gateway against your stack in 90 minutes.