How to build a RAG (retrieval-augmented generation) app with Vikasit AI
Retrieval-augmented generation grounds an LLM in your own documents: you embed your content, retrieve the most relevant chunks for a question, and pass them as context. This cuts hallucination and lets the model answer from your data.
Recommended model
Vikasit 3
Strong instruction-following and a low price make it a great default for synthesizing retrieved context into grounded answers. Use Vikasit 3 Max for harder, multi-document reasoning.
Steps
- 1
Chunk your documents (e.g. 500–1000 tokens with overlap) and store them.
- 2
Embed each chunk and store the vectors in a vector database (FAISS, pgvector, Qdrant, etc.).
- 3
At query time, embed the user question and retrieve the top-k most similar chunks.
- 4
Build a prompt that includes the retrieved chunks as context plus the question.
- 5
Call the Vikasit chat API and instruct the model to answer only from the provided context.
- 6
Return the answer along with citations to the source chunks for trust.
Code
The Vikasit Inference API is OpenAI-compatible, so this uses the standard OpenAI Python SDK pointed at https://api.vikasit.ai/v1.
from openai import OpenAI
client = OpenAI(
base_url="https://api.vikasit.ai/v1",
api_key="sk-vikasit-...", # get one at vikasit.ai/auth
)
def answer(question: str, retrieved_chunks: list[str]) -> str:
context = "\n\n".join(retrieved_chunks)
resp = client.chat.completions.create(
model="vikasit-3",
messages=[
{
"role": "system",
"content": (
"Answer using ONLY the context below. "
"If the answer isn't there, say you don't know.\n\n"
f"Context:\n{context}"
),
},
{"role": "user", "content": question},
],
)
return resp.choices[0].message.contentBuild your RAG (retrieval-augmented generation) app today
Get an API key and 2M free tokens a day on Vikasit Nova. Pay-as-you-go, no minimums, OpenAI-compatible.