Unifying 14 years of merchandising knowledge into a single retrieval engine
The Context
A national retailer with 1,200+ stores, four ERPs from acquisitions, and a 14-year archive of merchandising and supplier policy documents fragmented across SharePoint, Confluence, and shared drives.
The Technical Debt
Category buyers spent 6–9 hours per week locating policy precedents, supplier terms, and historical merchandising decisions. Off-the-shelf "AI search" tools failed evaluation because they could not cite, hallucinated supplier pricing, and ignored regional variations.
The Architecture
- Document-aware ingestion with table-preserving chunking across 1.2M documents
- Hybrid retrieval: BM25 + dense embeddings + cross-encoder reranking, tuned per content class
- Entitlement inheritance from existing Active Directory entitlements — buyers see only their categories
- Citation-grounded answers with paragraph-level provenance and confidence scoring
- Continuous evaluation harness with a 1,400-question golden set covering edge cases