How to build a voice assistant with Vikasit AI
A voice assistant chains speech-to-text, an LLM, and text-to-speech: transcribe the user's audio, reason over it with the model, then speak the reply. The LLM is the brain in the middle of the loop.
Recommended model
Vikasit 3 Flash
Low latency keeps voice interactions natural. Switch to Vikasit 3 when answers need more depth.
Steps
- 1
Capture audio and transcribe it to text with a speech-to-text engine.
- 2
Send the transcript to the Vikasit chat API with a concise system prompt.
- 3
Keep replies short and speakable — instruct the model to avoid markdown.
- 4
Convert the model's reply to audio with a text-to-speech engine.
- 5
Play the audio back and loop for the next turn.
- 6
Stream the LLM response to start speaking sooner and cut perceived latency.
Code
The Vikasit Inference API is OpenAI-compatible, so this uses the standard OpenAI Python SDK pointed at https://api.vikasit.ai/v1.
from openai import OpenAI
client = OpenAI(
base_url="https://api.vikasit.ai/v1",
api_key="sk-vikasit-...", # get one at vikasit.ai/auth
)
def respond(transcript: str) -> str:
resp = client.chat.completions.create(
model="vikasit-3-flash",
messages=[
{
"role": "system",
"content": "You are a voice assistant. Reply in one or two short, spoken sentences. No markdown.",
},
{"role": "user", "content": transcript},
],
)
return resp.choices[0].message.content
# Pass the returned text to your text-to-speech engine.Build your voice assistant today
Get an API key and 2M free tokens a day on Vikasit Nova. Pay-as-you-go, no minimums, OpenAI-compatible.