🕉
Architecture Overview
High-level design
┌─────────────────────────────────────────────────┐
│ User │
│ (Vercel — chat UI) │
└──────────────────┬──────────────────────────────┘
│ POST /api/chat
▼
┌─────────────────────────────────────────────────┐
│ API Layer (Render — free tier) │
│ ┌─────────────────────────────────────────┐ │
│ │ FastAPI (Python) │ │
│ │ - Rate limiter (30/min/IP) │ │
│ │ - CORS locked to frontend domain │ │
│ │ - Request validation │ │
│ └────────────────┬────────────────────────┘ │
│ ┌────────────────┴────────────────────────┐ │
│ │ Web Search (DDG → Google) │ │
│ │ Retrieves public info from the web │ │
│ └────────────────┬────────────────────────┘ │
│ ┌────────────────┴────────────────────────┐ │
│ │ Context + Prompt + Question │ │
│ │ sent to Sarvam AI for synthesis │ │
│ └────────────────┬────────────────────────┘ │
└──────────────────┬──────────────────────────────┘
│ Sarvam AI API
▼
┌─────────────────────────────────────────────────┐
│ Sarvam AI (sarvam-105b) │
│ 64K context, hosted by Sarvam AI │
└─────────────────────────────────────────────────┘
Key components
1. Frontend (veda-guru-ai-ui)
- Static site deployed on Vercel (free tier)
- Vanilla HTML/CSS/JS with
marked.jsfor markdown rendering - Responsive design with Vedic-themed styling
- Suggestion chips for quick queries
2. API Service (veda-guru-ai-api)
- FastAPI (Python) deployed on Render (free tier)
- Single endpoint:
POST /api/chat - Rate limited: 30 requests/minute per IP via
slowapi - CORS: Restricted to the frontend domain only
- Request timeout: 120s (handles Render cold starts)
3. Web Search
- Primary: DuckDuckGo via
duckduckgo-searchlibrary - Fallback: Google via
googlesearch-python - No API keys needed for either
- Search results are formatted as context for the LLM
4. LLM — Sarvam AI
- Model:
sarvam-105b(128K context window) - Context window: 128K tokens
- Auth:
api-subscription-keyheader - Endpoint:
POST https://api.sarvam.ai/v1/chat/completions
Data flow (query)
User: "What does Rig Veda say about truth?"
1. POST /api/chat { message: "..." }
2. Check rate limit → reject if over quota
3. Search web (DuckDuckGo → Google fallback)
4. Format results + question into prompt
5. Send prompt to Sarvam AI /v1/chat/completions
6. Parse response, attach source URLs
7. Return { reply, sources } to frontend
8. Frontend renders markdown + source toggle
Security
- Rate limiting: 30 requests/minute per IP
- CORS: Only the frontend domain is allowed
- LLM key:
SARVAM_API_KEYstored as Render env var, never in code - No PII: No user accounts, no data stored
- Cold start: Render free tier sleeps after 15min idle — first request may take 30-60s
Repositories
| Repo | Purpose | URL |
|---|---|---|
veda-guru-ai-docs |
Documentation site | GitHub |
veda-guru-ai-api |
FastAPI chatbot backend | GitHub |
veda-guru-ai-ui |
Chat frontend | GitHub |