Inference & pricing
Build guide

How to build a voice assistant with Vikasit AI

A voice assistant chains speech-to-text, an LLM, and text-to-speech: transcribe the user's audio, reason over it with the model, then speak the reply. The LLM is the brain in the middle of the loop.

Recommended model

Vikasit 3 Flash

Low latency keeps voice interactions natural. Switch to Vikasit 3 when answers need more depth.

Steps

  1. 1

    Capture audio and transcribe it to text with a speech-to-text engine.

  2. 2

    Send the transcript to the Vikasit chat API with a concise system prompt.

  3. 3

    Keep replies short and speakable — instruct the model to avoid markdown.

  4. 4

    Convert the model's reply to audio with a text-to-speech engine.

  5. 5

    Play the audio back and loop for the next turn.

  6. 6

    Stream the LLM response to start speaking sooner and cut perceived latency.

Code

The Vikasit Inference API is OpenAI-compatible, so this uses the standard OpenAI Python SDK pointed at https://api.vikasit.ai/v1.

voice-assistant.py
from openai import OpenAI

client = OpenAI(
    base_url="https://api.vikasit.ai/v1",
    api_key="sk-vikasit-...",  # get one at vikasit.ai/auth
)

def respond(transcript: str) -> str:
    resp = client.chat.completions.create(
        model="vikasit-3-flash",
        messages=[
            {
                "role": "system",
                "content": "You are a voice assistant. Reply in one or two short, spoken sentences. No markdown.",
            },
            {"role": "user", "content": transcript},
        ],
    )
    return resp.choices[0].message.content
    # Pass the returned text to your text-to-speech engine.

Build your voice assistant today

Get an API key and 2M free tokens a day on Vikasit Nova. Pay-as-you-go, no minimums, OpenAI-compatible.