Custom Knowledge Bases with RAG

Retrieval-Augmented Generation — RAG — is the technology that transforms a generic AI into a specialist for your specific business. Without RAG, asking Claude about your company's products, your client's history, or your proprietary processes produces hallucinations and generic answers. With RAG, the AI searches your actual documents before answering, grounding every response in your real data. This is the architecture that makes enterprise AI deployments trustworthy. It's the secret sauce for making AI truly useful for Pakistani businesses dealing with specific regulations, client histories, or product catalogs.

Section 1: How RAG Works — The Non-Technical Explanation

Imagine you hire a brilliant new consultant. They are incredibly intelligent but know nothing about your specific business. You could spend weeks training them verbally — or you could give them access to your entire document library and let them search it when answering questions. RAG is the second approach for AI. This approach is far more scalable and cost-effective than trying to "fine-tune" an entire large language model with your private data, which can be computationally intensive and expensive, especially for smaller firms in Pakistan.

The RAG Flow: Indexing and Retrieval

The RAG process can be broken down into two main phases: Indexing (preparing your data) and Retrieval & Generation (answering questions).

code

+-------------------+      +-------------------+
|   Your Documents  |      |   User Question   |
| (PDFs, DOCX, TXT) |      | (e.g., "What's our |
+-------------------+      |  refund policy?") |
         |                          |
         V                          V
+-------------------+        +-------------------+
| 1. Document Split |        | 4. Embed Question |
| (into smaller     |        |   (Vectorize it)  |
|   chunks)         |        +-------------------+
+-------------------+                  |
         |                             V
+-------------------+        +-------------------+
| 2. Chunk Embedding| <----- | 5. Vector Search  |
| (Convert chunks to|        | (Find similar     |
|   numerical       |        |   vectors in DB)  |
|   vectors)        |        +-------------------+
+-------------------+                  |
         |                             V
+-------------------+        +-------------------+
| 3. Store in Vector| -----> | 6. Retrieve Chunks|
|    Database       |        | (Top N most       |
| (ChromaDB, Pinecone)|       |   relevant)       |
+-------------------+                  |
                                       V
                       +---------------------------------+
                       | 7. Construct Prompt for LLM     |
                       | (Context: Retrieved Chunks      |
                       |  Question: User's Query)        |
                       +---------------------------------+
                                       |
                                       V
                               +-------------------+
                               | 8. LLM Generates  |
                               |    Answer         |
                               | (Grounded in YOUR |
                               |    data)          |
                               +-------------------+

Pakistan Business Example: A Karachi law firm has 500 client case files. A junior associate asks: "What was the settlement amount in the Malik Industries case from 2024?" Without RAG, the AI would either hallucinate a number or say it does not know. With RAG, it searches the case files, retrieves the relevant document, and answers accurately. This saves countless hours for lawyers, allowing them to focus on legal strategy rather than digging through archives.

RAG vs. Fine-tuning: A Quick Comparison

While both RAG and fine-tuning aim to make an AI model more specific to your data, they achieve it differently.

Feature	Retrieval-Augmented Generation (RAG)	Fine-tuning
Data Usage	Retrieves data from an external knowledge base at query time.	Modifies the model's weights by training on a custom dataset.
Knowledge	Provides up-to-date, external knowledge.	Embeds specific knowledge directly into the model's parameters.
Cost	Generally lower (API calls for embeddings, vector DB, LLM).	Higher (significant computational resources for training).
Update Frequency	Easy to update knowledge base (add/remove documents).	Requires retraining the entire model for updates.
Hallucinations	Significantly reduced due to grounding in external data.	Can still hallucinate, but less frequently on fine-tuned topics.
Use Case	Answering specific questions from large, frequently changing docs.	Adapting model's style, tone, or specific output formats.
Transparency	Easy to trace source documents for answers.	Opaque, difficult to trace source of specific outputs.
Typical Cost (PKR)	Starting from PKR 5,000-10,000 for basic setup, then per query.	Can range from PKR 50,000 to millions for complex models and datasets.

Section 2: Building a Simple RAG System

Here is a production-ready RAG implementation using open-source tools. This setup is perfect for Pakistani startups or SMEs looking to leverage AI without hefty initial investments.

Step 1: Install Dependencies

bash

pip install anthropic chromadb sentence-transformers pypdf2

anthropic: Python client for interacting with Anthropic's Claude models.
chromadb: An open-source, lightweight vector database that runs locally. Perfect for getting started without cloud infrastructure.
sentence-transformers: A library for state-of-the-art sentence embeddings, allowing us to convert text into numerical vectors.
pypdf2: A library for extracting text from PDF documents.

Step 2: Create the Knowledge Base

python

import chromadb
from sentence_transformers import SentenceTransformer
import PyPDF2
import os
from typing import List

# Initialize the embedding model (free, runs locally)
# 'all-MiniLM-L6-v2' is a good balance of speed and performance.
# For Urdu or mixed-language content, consider multi-lingual models like 'paraphrase-multilingual-MiniLM-L12-v2'.
embedder = SentenceTransformer('all-MiniLM-L6-v2')

# Initialize ChromaDB (free, local vector database)
# You can specify a path for persistence, e.g., chromadb.PersistentClient(path="/path/to/db")
chroma_client = chromadb.Client()
collection = chroma_client.create_collection("company_knowledge")

def chunk_text(text: str, chunk_size: int = 500, overlap: int = 100) -> List[str]:
    """
    Splits text into chunks with a specified size and overlap.
    This helps preserve context across chunk boundaries.
    """
    words = text.split()
    chunks = []
    for i in range(0, len(words), chunk_size - overlap):
        chunk = " ".join(words[i:i + chunk_size])
        chunks.append(chunk)
    return chunks

def add_document(file_path: str, doc_id: str):
    """Extract text from PDF and add to knowledge base"""
    try:
        # Extract text
        with open(file_path, 'rb') as f:
            reader = PyPDF2.PdfReader(f)
            text = " ".join([page.extract_text() for page in reader.pages if page.extract_text()])

        if not text.strip():
            print(f"Warning: No text extracted from {file_path}. Skipping.")
            return

        # Split into chunks (using our improved chunking function)
        chunks = chunk_text(text, chunk_size=500, overlap=100)

        # Embed and store each chunk
        for i, chunk in enumerate(chunks):
            if not chunk.strip(): # Skip empty chunks
                continue
            embedding = embedder.encode(chunk).tolist()
            collection.add(
                documents=[chunk],
                embeddings=[embedding],
                ids=[f"{doc_id}_chunk_{i}"]
            )
        print(f"Added {len(chunks)} chunks from {file_path}")
    except Exception as e:
        print(f"Error processing {file_path}: {e}")

# Example of adding multiple documents
# Imagine these are your HR policies, product manuals, or client contracts.
# file_paths = ["hr_policy_2024.pdf", "product_catalog_Q3.pdf", "client_agreement_A.pdf"]
# for i, path in enumerate(file_paths):
#     add_document(path, f"doc_{i}")

Indexing Process Diagram:

code

+-------------------+       +--------------------+       +---------------------+
|  Raw Document     |       |   Text Chunks      |       |  Vector Embeddings  |
| (e.g., company.pdf)| ----> |  "Chunk 1 text..." | ----> |  [0.1, 0.5, -0.2...] |
+-------------------+       |  "Chunk 2 text..." |       |  [0.8, -0.1, 0.3...] |
                            |  "Chunk 3 text..." |       |  [-0.4, 0.7, 0.9...] |
                            +--------------------+       +---------------------+
                                     |                             |
                                     V                             V
                               +-------------------------------------+
                               |           ChromaDB Collection       |
                               | (Stores chunks & their embeddings)  |
                               +-------------------------------------+

Step 3: Query the Knowledge Base and Get AI Answers

python

import anthropic

# Ensure your ANTHROPIC_API_KEY is set as an environment variable
# For local development, you can also use `os.environ["ANTHROPIC_API_KEY"] = "sk-..."`
claude = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))

def rag_answer(question: str) -> str:
    """Retrieve relevant context and get AI answer"""
    # Embed the question
    question_embedding = embedder.encode(question).tolist()

    # Search for relevant chunks in the ChromaDB collection
    # n_results determines how many top relevant chunks are retrieved.
    # Experiment with this value (3-5 is a good starting point).
    results = collection.query(
        query_embeddings=[question_embedding],
        n_results=3,
        include=['documents', 'distances', 'metadatas'] # Include metadata for potential source tracking
    )

    # Build context from retrieved chunks
    # We join them with a separator to clearly delineate chunks for the LLM.
    context = "\n\n---\n\n".join(results['documents'][0])

    # The prompt engineering is crucial here. We explicitly instruct the AI
    # to use ONLY the provided context and to state if the answer isn't found.
    system_prompt = "You are a specialist assistant. Answer the question using ONLY the provided context. If the answer is not in the context, say so explicitly. Be concise and helpful."
    
    # Structure the message for Anthropic's API
    response = claude.messages.create(
        model="claude-sonnet-4-6", # claude-sonnet-4-6 offers a good balance of cost and performance.
        max_tokens=800,
        system=system_prompt,
        messages=[{
            "role": "user",
            "content": f"""CONTEXT:
{context}

QUESTION: {question}

ANSWER:"""
        }]
    )
    return response.content[0].text

# Example usage (assuming 'company_policy.pdf' exists and has been added)
# To test, create a dummy 'company_policy.pdf' with some text, e.g., "Our refund policy states that services over PKR 50,000 are non-refundable after 7 days."
# If `company_policy.pdf` doesn't exist, create a dummy one for testing.
# For example, create a file named `company_policy.pdf` with content:
# "This is our company policy document. Our refund policy for services over PKR 50,000 is that they are non-refundable after 7 days from the purchase date. For services under PKR 50,000, a 50% refund is available within 14 days. Employee leave policy: All full-time employees are eligible for 15 casual leaves and 10 sick leaves per year. Annual leave is 20 days."

# Before running, ensure you have a dummy PDF or replace PyPDF2 with a simple text loader.
# For demonstration:
# with open("company_policy.txt", "w") as f:
#    f.write("Our refund policy states that services over PKR 50,000 are non-refundable after 7 days from the purchase date. For services under PKR 50,000, a 50% refund is available within 14 days. Employee leave policy: All full-time employees are eligible for 15 casual leaves and 10 sick leaves per year. Annual leave is 20 days.")
# Then modify `add_document` to read `.txt` files or use a library like `python-docx` for `.docx`.

# For now, let's assume company_policy.pdf exists or create it manually with some content.
# add_document("company_policy.pdf", "policy_2026") # Uncomment if you have the PDF
# answer = rag_answer("What is our refund policy for services over PKR 50,000?")
# print(answer)

# JSON representation of the context sent to the LLM:
# {
#   "role": "user",
#   "content": [
#     {
#       "type": "text",
#       "text": "CONTEXT:\nOur refund policy states that services over PKR 50,000 are non-refundable after 7 days from the purchase date.\n\n---\n\nFor services under PKR 50,000, a 50% refund is available within 14 days.\n\n---\n\nEmployee leave policy: All full-time employees are eligible for 15 casual leaves and 10 sick leaves per year.\n\nQUESTION: What is our refund policy for services over PKR 50,000?\n\nANSWER:"
#     }
#   ]
# }

Section 3: RAG for Common Pakistani Business Use Cases

RAG is incredibly versatile and can be applied across various sectors in Pakistan, from small businesses to large enterprises.

Use Case 1: HR Policy Bot Upload your employee handbook, HR policies, and SOPs (Standard Operating Procedures). Team members can ask questions in plain Urdu or English and get accurate, policy-grounded answers — reducing HR query load by 60%. This is particularly useful for companies with a large workforce across different cities like Lahore, Karachi, and Islamabad, ensuring consistent information dissemination. Imagine reducing the time HR spends answering "How many casual leaves do I have?" from 10 minutes to 10 seconds, saving potentially thousands of PKR in HR operational costs monthly.

Use Case 2: Client FAQ Bot Upload client contracts, project briefs, and communication history. When a client asks "What did we agree about revision rounds?", your system retrieves the actual contract clause and answers accurately. This helps maintain strong client relationships and avoids misunderstandings, which can be crucial for freelance agencies on platforms like Fiverr or Upwork dealing with international clients.

Use Case 3: Product Knowledge Base For an e-commerce business selling on Daraz or their own website, upload product specifications, pricing sheets, and inventory data. Customer service queries get accurate, current answers, whether it's about a mobile phone's battery life or the warranty period for a refrigerator. This can significantly improve customer satisfaction and reduce return rates, boosting sales.

Use Case 4: Regulatory Compliance Assistant For accountants or lawyers handling SECP (Securities and Exchange Commission of Pakistan), FBR (Federal Board of Revenue), or SBP (State Bank of Pakistan) regulations, upload the relevant regulatory documents. Queries about specific rules (e.g., "What are the latest FBR requirements for sales tax registration?") return citations from the actual regulations, ensuring compliance and mitigating legal risks. This is invaluable for chartered accountants and legal firms across Pakistan.

Use Case 5: Real Estate Information System Imagine a real estate agency using RAG. Upload property listings from Zameen.pk, client preferences, historical sales data, and legal documents. A real estate agent can ask, "Show me 3-bedroom houses for sale under PKR 2 Crore in DHA Lahore with a plot size over 10 Marla and a possession date within 6 months," and the RAG system can sift through thousands of listings and provide precise matches, including links to the original property documents.

Pakistan Case Study: Daraz Seller Support AI

Scenario: A rapidly growing e-commerce business in Pakistan, "TechGadgets.pk," sells electronics on Daraz and their own Shopify store. They receive hundreds of customer inquiries daily regarding product specifications, warranty claims, delivery times, and return policies. Their small customer service team in Faisalabad is overwhelmed, leading to delays and inconsistent answers.

Challenge: Training new customer service agents takes weeks, and even experienced agents struggle to keep up with constantly updated product catalogs, changing Daraz policies, and various brand-specific warranty terms. Hallucinations from a generic AI would be disastrous for customer trust.

RAG Solution: TechGadgets.pk decided to implement a RAG system.

Data Ingestion: They uploaded all their product specification sheets (PDFs), warranty documents from various brands (e.g., Samsung, Xiaomi, Apple), Daraz seller policies, internal SOPs for returns/exchanges, and a comprehensive FAQ document. This amounted to over 1,000 documents.
System Setup: Using a similar architecture as described above (ChromaDB, SentenceTransformers, and Anthropic's Claude), they built a custom knowledge base.
Customer Service Bot: They integrated the RAG system into a simple chatbot interface accessible to their customer service team.
Query Example:
- Customer Query: "What is the warranty for the Samsung Galaxy A54, and how do I claim it?"
- RAG Process:
  1. The query is embedded.
  2. The system searches the knowledge base, retrieving relevant sections from the "Samsung Warranty Policy.pdf" and "TechGadgets Return & Warranty SOP.pdf".
  3. These retrieved chunks are sent to Claude along with the original question.
  4. Claude generates an accurate answer, specifying the warranty period (e.g., "1 year official warranty") and the claim process (e.g., "Contact Samsung service center or raise a ticket on Daraz/TechGadgets website within 7 days of delivery").

Impact:

Reduced Response Time: Average customer query resolution time dropped by 70%.
Improved Accuracy: Hallucinations were virtually eliminated, leading to consistent and trustworthy answers.
Cost Savings: The need to hire additional customer service agents was deferred, saving TechGadgets.pk an estimated PKR 150,000 per month in salaries.
Scalability: The system can easily scale by adding more documents as new products are launched or policies change.
Agent Empowerment: Agents could quickly find answers, making them more efficient and confident. They could even use it to answer queries in both English and basic Urdu.

This allowed TechGadgets.pk to maintain high customer satisfaction ratings on Daraz and strengthen their brand reputation in the competitive Pakistani e-commerce market.

Practice Lab

Exercise 1: Your First Knowledge Base Take 3-5 documents from your business (SOPs, client contracts, product specs — anything). For testing, you can even use .txt files instead of .pdf by modifying the add_document function to read .txt directly. Follow the code above to create a local ChromaDB collection and add these documents. Ask 5 questions and evaluate the accuracy of the answers provided by rag_answer. Pay attention to whether the AI uses only the provided context.

Exercise 2: Chunk Size Experiment Run the same knowledge base with different chunking strategies.

Small Chunks: Modify chunk_text to use chunk_size=250, overlap=50.
Medium Chunks: Use the current chunk_size=500, overlap=100.
Large Chunks: Modify chunk_text to use chunk_size=1000, overlap=200. Ask the same 5 questions for each configuration and compare answer quality, relevance, and presence of hallucinations. Note which chunk size works best for your document type. Smaller chunks might miss context, while larger chunks might dilute relevance.

Exercise 3: Source Citation Modify the rag_answer function to also return the document IDs and even the specific text chunks that were used to answer the question. This creates an audit trail — you can verify every AI answer traces back to a specific source document, enhancing trust and verifiability. This is critical for compliance-heavy industries in Pakistan. You'll need to utilize the results['documents'] and results['ids'] returned by the collection.query call.

Exercise 4: Multi-document Query Add at least 5-7 diverse documents to your knowledge base (e.g., an HR policy, a product manual, a sales report, a client agreement, a marketing brief). Then, formulate a question that requires information from at least two different documents to answer comprehensively. Evaluate how well the RAG system retrieves and synthesizes information from multiple sources.

Key Takeaways

RAG is Essential for Enterprise AI: It grounds AI answers in your actual documents, eliminating hallucinations on company-specific knowledge, making AI trustworthy for Pakistani businesses.
The RAG Lifecycle: The system works in two main phases: Indexing (embedding and storing your documents) and Retrieval & Generation (embedding the query, retrieving relevant chunks, then asking the LLM).
Open-Source Power: ChromaDB and SentenceTransformers are free, open-source tools that run locally — enabling powerful RAG capabilities without additional API costs for vector storage or embeddings, making it accessible for startups.
Chunking is Key: Chunk size (how you split documents) significantly affects answer quality; careful experimentation to find your optimum is crucial for different types of documents (e.g., legal contracts vs. product descriptions).
Prompt Engineering with Context: Explicitly instructing the LLM to use only the provided context is vital for preventing hallucinations and ensuring grounded responses.
Scalability and Maintainability: RAG systems are highly scalable, allowing you to add or remove documents easily, and maintain up-to-date knowledge without expensive model retraining, perfect for dynamic business environments.

5.2 — Custom Knowledge Bases with RAG