Back to Projects

RealProdRag

Project Motivation & Problem Statement

Enterprise chatbots and knowledge assistants need to answer questions grounded in specific organizational documents-not general internet knowledge. Off-the-shelf LLMs often hallucinate or provide outdated information when asked domain-specific questions. RealProdRag is a production-ready Retrieval-Augmented Generation (RAG) system built for a real client delivery, designed to answer user queries by retrieving relevant documents from an ingested knowledge base and generating accurate, context-grounded responses with intelligent country-based filtering and GPT-4o fallback capabilities.

Technical Approach

1. Document Ingestion Pipeline

  • Built an automated document ingestion system that processes PDFs, text files, and structured data into chunked, indexed representations.
  • Implemented intelligent chunking strategies (fixed-size with overlap, semantic boundary detection) to preserve context within each document segment.
  • Generated dense vector embeddings for each chunk using OpenAI's embedding models and stored them in a vector database for fast similarity search.
  • Designed the ingestion pipeline to support incremental updates-new documents can be added without reprocessing the entire corpus.

2. Retrieval & Ranking System

  • Implemented semantic search using cosine similarity between query embeddings and document chunk embeddings for context-aware retrieval.
  • Built country-based metadata filtering that restricts search results to documents relevant to the user's geographic context, ensuring compliance with regional information policies.
  • Designed a re-ranking layer that combines semantic similarity scores with metadata relevance to surface the most pertinent chunks.

3. Generation & Fallback Logic

  • Integrated OpenAI GPT-4o for response generation, feeding retrieved document chunks as context alongside the user's query.
  • Implemented confidence scoring on retrieved results-when retrieval confidence falls below a threshold, the system falls back to GPT-4o search preview for web-augmented answers.
  • Built prompt templates that instruct the model to cite sources, stay grounded in retrieved context, and explicitly state when information is unavailable.

4. API Architecture & Backend

  • Developed a FastAPI backend with endpoints for document ingestion (/ingest), query processing (/query), and system health monitoring.
  • Implemented session management to support multi-turn conversations with context carryover.
  • Used SQL for metadata storage and query logging, enabling analytics on usage patterns and retrieval quality.

5. Deployment & CI/CD

  • Containerized the entire system with Docker for consistent deployment across development, staging, and production environments.
  • Set up GitHub Actions workflows for automated testing, image building, and deployment on every code push.
  • Configured environment variables and secrets management for secure API key handling in production.

Results

  • Delivered a production RAG system that accurately answers domain-specific queries grounded in client documents.
  • Country-based filtering ensured regional compliance and relevance in multi-geography deployments.
  • GPT-4o fallback provided graceful degradation when retrieved documents didn't contain sufficient information.
  • Docker-based deployment and CI/CD pipeline enabled rapid iteration and reliable production releases.

Limitations

  • Retrieval quality depends heavily on the chunking strategy; suboptimal chunk boundaries can split critical context.
  • Embedding model may not capture domain-specific semantics as well as a fine-tuned model would.
  • Fallback to GPT-4o search introduces latency and may return information not aligned with client policies.

Skills and Technologies Demonstrated

  • Retrieval-Augmented Generation (RAG) system design
  • FastAPI production backend development
  • OpenAI API integration (embeddings and GPT-4o)
  • Vector database and semantic search implementation
  • Document ingestion and chunking pipelines
  • Docker containerization and GitHub Actions CI/CD
  • SQL for metadata management and logging
  • Client delivery and production deployment

Resources