What breaks when you put RAG in front of a SharePoint tenant

The retrieval-augmented pilot went great. A chatbot grounded in the company's SharePoint tenant, answering employee questions with citations to the source documents. The demo got a standing ovation from the steering committee. The pilot shipped to a hundred users the following week.

By the end of that week, three different people had raised incidents. An engineer could retrieve fragments of the CFO's draft board deck. A contractor whose access had been revoked on Friday was still getting responses citing the documents they used to be allowed to see. HR discovered the index contained PII from an employee who left the company nine months earlier.

The team's response was predictable: "The RAG pilot has a governance problem." The accurate response is that the RAG pilot has three governance problems, and every one of them was inevitable given how the index was built.

What the default RAG stack assumes

The default retrieval-augmented generation stack treats the vector index like a cache. Documents are ingested, chunked, embedded, and stored. The retrieval layer matches query embeddings against stored chunks and returns the top K. The application hands the retrieved chunks to the LLM as context. The LLM produces a grounded response with citations.

This architecture is beautifully simple. It is also, in an enterprise context, architecturally hostile to the access-control and governance model the rest of the company runs on.

Three assumptions baked in by default cause the three failure modes.

First, the index is a durable copy. Every chunk stored in the vector store is a copy of the source document, usually without the access controls the source document carried. The chunk exists independently of the SharePoint file. When the file's permissions change, the chunk does not. When the file is deleted, the chunk persists.

Second, retrieval has no identity. The retrieval call takes a query and returns top-K chunks. It does not know who is asking. A naive implementation returns the same chunks to any caller, because the caller identity is not part of the retrieval contract.

Third, the index has no notion of retention. SharePoint files carry retention policies. Legal hold. Departmental retention. Personnel-file retention. These policies expire content on a schedule that reflects business and regulatory obligations. The vector index does not participate in this schedule. Content that expired in the source system lives on in the index until someone notices.

Each of these assumptions breaks a specific real-world expectation.

The three failures in detail

Failure one: ACL leakage.

An engineer at the company cannot read the CFO's draft board deck. The SharePoint file is restricted to the executive team. But the ingestion pipeline that populated the vector index ran under a service account with broad read access, because that was the simplest way to index everything. Every chunk from the draft board deck now lives in the index without the ACL restriction that protected the source file.

When the engineer asks the chatbot about board-level strategic priorities, the retrieval layer returns chunks from the draft deck alongside chunks from public documents. The LLM happily cites both. The engineer is reading content they are not cleared to see.

This is not a bug in the chatbot. It is the default behavior of the default architecture.

Failure two: stale access.

A contractor's access was revoked on Friday afternoon. The revocation propagated through SharePoint, Azure AD, and the company's SSO provider correctly. Documents the contractor could read yesterday are inaccessible today.

The vector index does not participate in the propagation. Its copies of those documents are still sitting in the store, still retrievable, still cited in responses to queries from the contractor's session, which is still valid because it was issued before the revocation and expires on its own schedule.

The contractor is reading content they are no longer authorized to read. The information-security team discovers this when the contractor mentions in a handover call that the chatbot is still useful. It stops being useful on Monday when the team realizes what is happening and takes the chatbot down.

Failure three: persistent PII.

An employee left the company nine months ago. The HR system offboarded them. Their personnel file moved to the long-term retention tier. Their profile was removed from the global address book. The subject access request they filed three months later expected to not surface any remaining active data.

The vector index still contains chunks from documents that mention them, because the ingestion ran quarterly and the last run before their departure captured everything. Their name appears in chunks retrievable by queries about projects they worked on. The subject access response the privacy office has to produce now has to include a detailed explanation of why their data persists in an AI system after they left.

The privacy office writes the explanation once. Then they block the chatbot from indexing any document containing personnel data, which strips out half the value of the retrieval-augmented system. Then they unblock specific document types under specific conditions, which takes a quarter. Then the chatbot ships again, at a fraction of the originally-demoed capability.

Retrieval-time enforcement is the fix

The pattern that survives is to stop treating retrieval as a stateless lookup and start treating it as an authenticated authorization decision.

Every document in the index carries metadata attached at ingestion time: the source system, the source-system document ID, the source ACL (user and group identifiers that can read the source), the tenant if the index serves multiple customers, the data classification, the retention class, the ingestion timestamp, the source modification timestamp.

Every retrieval call carries the caller's authenticated identity, the caller's tenant if relevant, the access purpose the caller declared, and any other claims the authorization layer needs.

The retrieval engine computes the intersection. It returns only chunks where the caller's identity is allowed to read the source, the tenant matches, the purpose is within the allowed scope, and the retention class has not expired. Everything else is filtered out before the LLM sees it.

Source changes propagate into the index. When SharePoint revokes a user's access to a file, the index's ACL metadata for that file's chunks updates. When a retention policy expires, the expiration job removes the chunks. When a user is offboarded, their identity drops out of ACL memberships and they stop being able to retrieve content they could retrieve yesterday.

This is not theoretical. It is what the SharePoint team, the Azure AD team, and the enterprise content management team have been running for the underlying systems for over a decade. The retrieval index needs to participate in that control plane, not create a parallel one.

What the CISO's sign-off looks like

A CISO signing off on a retrieval-augmented chatbot is not signing off on the LLM. They are signing off on the retrieval boundary. Three questions decide the sign-off.

Does the index participate in the source systems' access control? If a user is revoked, the index revokes. If a file's ACL changes, the index reflects the change. If the answer is yes, with a documented propagation SLA, the first concern clears.

Does the retrieval call carry authenticated caller identity end to end? Not a service account, not a token that was valid at ingestion time, but a real identity tied to the actual caller of the chatbot for this query. If yes, the second concern clears.

Does the index respect the source systems' retention schedule? Expirations propagate. Legal holds are honored. Subject access requests can surface every chunk referencing a subject. If yes, the third concern clears.

Once those three clear, the CISO signs. The chatbot ships. The pilot that looked like it had a governance problem instead looked like it had an architecture problem, and the architecture got fixed.

The broader lesson

Retrieval-augmented generation is one of the most-demoed and least-productionized patterns in enterprise AI. The reason is almost never the quality of the LLM. The reason is the gap between the default stack and the enterprise governance expectations for any system that touches source documents.

Close that gap once, at the retrieval layer, and the pattern becomes defensible. Fail to close it, and every pilot runs into the same three incidents in the same order.

Next step

If you are running a RAG pilot that is showing any of the three failures above, or preparing one and want to avoid them, an architecture review takes your current indexing pipeline, maps it against the source systems' access-control and retention behavior, and produces a findings document your CISO and DPO can sign off on jointly.

Book an architecture review →