Vikasit Omni
Full multimodal — text + image + audio + video in, text + speech out. Real-time.
Overview
Vikasit Omni is the full multimodal model — accepts text, image, audio, and video, and produces both text and natural speech in real time with natural turn-taking. Built on a MoE Thinker–Talker architecture.
Specifications
- Total parameters
- 35B total
- Active parameters
- 3B active (Thinker) + 0.3B (Talker)
- Architecture
- MoE Thinker–Talker, multi-codebook
- Context window
- 32K
- Modalities
- Text + image + audio + video in → text + speech out
- License
- Apache 2.0
Capabilities
- Any-modality in (text/image/audio/video)
- Real-time speech out with turn-taking
- Low-latency (~234 ms audio cold-start)
- Open-source SOTA on most audio/AV benchmarks
Benchmarks
| Benchmark | Score |
|---|---|
| ASR LibriSpeech clean/other | 1.22 / 2.48 |
| ASR Fleurs en / zh | 2.72 / 2.20 |
| MMAU (audio understanding) | 77.5 |
| VoiceBench (overall) | 88.8 |
| Vision: MMMU / MathVista | 69.1 / 75.9 |
| Text: MMLU-Redux / GPQA / AIME25 | 86.6 / 69.6 / 65.0 |
| Speech-out TTS WER (SEED zh/en) | 1.07 / 1.39 |
Numbers from the Qwen3-Omni Technical Report (arXiv:2509.17765). Open-source SOTA on 32/36 audio + audio-visual benchmarks.
Hardware & deployment
| Precision | Memory |
|---|---|
| bf16 | ~70 GB |
| INT4 | ~20 GB |
Quick start
Vikasit Omni is an open-weight model. Self-host it with any OpenAI-compatible inference server and call it with the OpenAI SDK as shown below.
# pip install openai
import os
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="sk-local", # self-hosted servers accept any token
)
resp = client.chat.completions.create(
model="vikasit-omni",
messages=[
{"role": "user", "content": "Explain Vikasit Omni in one sentence."}
],
)
print(resp.choices[0].message.content)Limitations
- 32K context (shorter than text-only flagships)
- Speech-out limited to 10 languages
Vikasit Omni FAQ
How much does Vikasit Omni cost?
Vikasit Omni is an open-weight model built on Qwen3-Omni-30B-A3B (Apache 2.0). Self-hosting the weights is free under the Apache 2.0 licence — you pay only for the hardware or cloud GPUs you run it on. Typical deployment fits the memory profiles listed in the hardware section above.
Is Vikasit Omni open weight?
Yes. Vikasit Omni is built on Qwen3-Omni-30B-A3B (Apache 2.0) and distributed under the Apache 2.0 licence, so the weights are openly available for self-hosting, fine-tuning, and commercial use, subject to the upstream licence terms.
How do I run Vikasit Omni?
Because Vikasit Omni is open weight, you self-host it with any OpenAI-compatible inference server (such as vLLM or SGLang) loaded with the Qwen3-Omni-30B-A3B (Apache 2.0) weights, then call it with the OpenAI SDK by setting the base URL to your own endpoint.
What context window does Vikasit Omni support?
Vikasit Omni supports a 32K context window. It is a 35B total (3B active (Thinker) + 0.3B (Talker)) MoE Thinker–Talker, multi-codebook model — full specifications are listed in the table above.
License & attribution
Apache 2.0
Built on Qwen3-Omni-30B-A3B (Apache 2.0). Upstream copyright, license, and attribution notices are retained.