System Design Problem

Design a Document Q&A Platform (RAG System)

Commonly Asked By:OpenAICoherePineconeGoogle

  • Document Upload: Users can upload PDFs, Word docs, and text files.
  • Q&A Interface: Users can ask natural language questions about their uploaded documents.
  • Citations: The AI's response must include exact citations (e.g., "Source: Employee_Handbook.pdf, Page 4").
  • Access Control: Users should only be able to query documents they have permission to view.

Retrieval-Augmented Generation (RAG) bridges the gap between a user's private data and a public LLM. We can break this architecture down into three explicit layers handling both Ingestion (Offline) and Retrieval (Online):

1. The Ingestion Pipeline (Offline)

Loading...

2. The Retrieval & Generation Pipeline (Online)

Loading...
  • API / Gateway Layer: Exposes the document upload endpoints (handling multi-part form data) and the chat query endpoints (handling SSE streaming for real-time LLM responses).
  • Service Layer: Contains the Ingestion Workers (which run OCR, chunk text, and call Embedding Models) and the RAG Orchestrator (which receives queries, fetches vectors, constructs prompts, and calls the Inference Engine).
  • Data / Storage Layer: The foundation. Uses Amazon S3 for raw document storage, PostgreSQL for document metadata and user permissions, and a highly-scalable Vector Database (Pinecone, Milvus) for storing high-dimensional embeddings and executing fast nearest-neighbor searches.