Access Control in RAG
1. The Cardinal Rule: Pre-Filter, Not Post-Filter
Never retrieve and then filter. If you retrieve top-K chunks and then discard the ones a user cannot see, you reveal the existence of those documents (information leakage) and you waste vector search capacity. Always pass the access filter into the vector search so unauthorized chunks are never retrieved.
Wrong pattern:
# BAD — retrieves everything, then filters
chunks = index.search(query_vector, top_k=20)
allowed_chunks = [c for c in chunks if user_can_access(c)] # Leaky!Correct pattern:
# GOOD — filter is applied inside the vector search
chunks = index.search(
query_vector,
top_k=5,
filter={"tenant_id": user.tenant_id} # Pre-filter
)2. Multi-Tenancy Patterns
In SaaS applications with multiple tenants, each tenant must be isolated.
Option A: Separate index per tenant
| Pro | Con |
|---|---|
| Perfect isolation | High operational overhead for many tenants |
| No risk of cross-tenant leakage | Resource waste for small tenants |
| Simple queries (no filter needed) | Index creation latency on tenant onboarding |
Best for: high-security environments with a small number of large tenants.
Option B: Shared index with tenant_id metadata filter
| Pro | Con |
|---|---|
| Single index to manage | Must ensure filter is always applied |
| Low overhead per tenant | Some ANN indexes degrade with high-cardinality filters |
| Easy tenant onboarding | Misapplied filter = data breach |
Best for: large numbers of small tenants (typical SaaS).
Option C: Namespace / partition per tenant (supported by Pinecone, Weaviate, Qdrant)
- Each tenant gets a logical partition within the same physical index
- Queries are scoped to a namespace at the API level — no per-query filter needed
- Best balance of isolation and operational simplicity
3. Document-Level ACLs
Beyond tenant-level isolation, individual documents may have fine-grained permissions (e.g., only certain roles or users can see a document).
Approach: store ACL metadata on each chunk
# Chunk metadata
{
"doc_id": "contract_2024_acme",
"tenant_id": "acme_corp",
"allowed_roles": ["legal", "finance"], # Role-based
"allowed_users": ["user_789"], # User-specific
"classification": "confidential"
}At query time:
user_roles = get_user_roles(user_id) # e.g. ["finance"]
chunks = index.search(
query_vector,
top_k=5,
filter={
"tenant_id": user.tenant_id,
"allowed_roles": {"$in": user_roles}
}
)Limitation: fine-grained ACL filters with many OR conditions can slow ANN search. Benchmark your vector DB’s filter performance under realistic ACL complexity.
4. ACL Synchronisation
ACLs in the vector index must stay in sync with the source system’s permissions.
The stale ACL problem: - A user loses access to a document in the source system (e.g., Confluence, SharePoint) - The vector index still has the old chunk with the old ACL → unauthorized access
Sync strategies:
| Strategy | Frequency | Approach |
|---|---|---|
| Event-driven | Real-time | Source system publishes permission-change events; update chunk metadata |
| Periodic re-sync | Hourly/daily | Crawl source ACLs, compare to index metadata, patch differences |
| Re-ingest on change | On document update | Always re-ingest full document (including ACL) when it changes |
Recommendation: Treat ACL changes as document updates — trigger a full re-ingest of affected documents so chunk metadata is always fresh.
5. Query-Time Identity Injection
The user’s identity must be injected at query time from a trusted source — never from the user’s own request.
Secure pattern:
# Authentication middleware resolves identity BEFORE hitting RAG
def rag_endpoint(request):
user = auth.verify_token(request.headers["Authorization"])
# user.tenant_id and user.roles are resolved server-side
result = rag_pipeline.query(
query=request.body["question"],
tenant_id=user.tenant_id, # Server-side, trusted
allowed_roles=user.roles # Server-side, trusted
)
return resultNever accept tenant_id or allowed_roles directly from the user’s request body. These values must come from your authentication system.
6. Audit Logging
Access-controlled systems need audit trails for compliance (SOC 2, ISO 27001, GDPR).
What to log per RAG request:
{
"request_id": "uuid",
"timestamp": "2024-11-01T12:00:00Z",
"user_id": "user_789",
"tenant_id": "acme_corp",
"query_hash": "sha256:...", // Hash — don't log raw PII queries
"retrieved_doc_ids": ["doc42", "doc17"],
"applied_filter": {"tenant_id": "acme_corp", "allowed_roles": ["finance"]}
}Key audit questions to be able to answer: - What documents did user X access on date Y? - Did any user access document Z without authorization? - When was the last time ACLs were synced for tenant T?
Summary
| Layer | Rule |
|---|---|
| Multi-tenancy | Namespace or metadata filter — never shared namespace |
| Document ACLs | Store on chunk metadata; filter inside ANN search |
| Identity | Inject from auth system server-side — never trust client |
| ACL sync | Treat permission changes as document updates → re-ingest |
| Audit | Log doc IDs accessed per user per request |
The single most important rule: the access filter must be part of the vector search call, not applied to results afterward.