All models
Omni / MultimodalOpen weights35B total · 3B active (Thinker) + 0.3B (Talker)

Vikasit Omni

Full multimodal — text + image + audio + video in, text + speech out. Real-time.

Overview

Vikasit Omni is the full multimodal model — accepts text, image, audio, and video, and produces both text and natural speech in real time with natural turn-taking. Built on a MoE Thinker–Talker architecture.

Specifications

Total parameters
35B total
Active parameters
3B active (Thinker) + 0.3B (Talker)
Architecture
MoE Thinker–Talker, multi-codebook
Context window
32K
Modalities
Text + image + audio + video in → text + speech out
License
Apache 2.0

Capabilities

  • Any-modality in (text/image/audio/video)
  • Real-time speech out with turn-taking
  • Low-latency (~234 ms audio cold-start)
  • Open-source SOTA on most audio/AV benchmarks
Text 119 langs; speech-in 19; speech-out 10.

Benchmarks

BenchmarkScore
ASR LibriSpeech clean/other1.22 / 2.48
ASR Fleurs en / zh2.72 / 2.20
MMAU (audio understanding)77.5
VoiceBench (overall)88.8
Vision: MMMU / MathVista69.1 / 75.9
Text: MMLU-Redux / GPQA / AIME2586.6 / 69.6 / 65.0
Speech-out TTS WER (SEED zh/en)1.07 / 1.39

Numbers from the Qwen3-Omni Technical Report (arXiv:2509.17765). Open-source SOTA on 32/36 audio + audio-visual benchmarks.

Hardware & deployment

PrecisionMemory
bf16~70 GB
INT4~20 GB

Quick start

Vikasit Omni is an open-weight model. Self-host it with any OpenAI-compatible inference server and call it with the OpenAI SDK as shown below.

OpenAI-compatible Python (self-hosted, e.g. vLLM)
# pip install openai
import os
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="sk-local",  # self-hosted servers accept any token
)

resp = client.chat.completions.create(
    model="vikasit-omni",
    messages=[
        {"role": "user", "content": "Explain Vikasit Omni in one sentence."}
    ],
)

print(resp.choices[0].message.content)

Limitations

  • 32K context (shorter than text-only flagships)
  • Speech-out limited to 10 languages

Vikasit Omni FAQ

How much does Vikasit Omni cost?

Vikasit Omni is an open-weight model built on Qwen3-Omni-30B-A3B (Apache 2.0). Self-hosting the weights is free under the Apache 2.0 licence — you pay only for the hardware or cloud GPUs you run it on. Typical deployment fits the memory profiles listed in the hardware section above.

Is Vikasit Omni open weight?

Yes. Vikasit Omni is built on Qwen3-Omni-30B-A3B (Apache 2.0) and distributed under the Apache 2.0 licence, so the weights are openly available for self-hosting, fine-tuning, and commercial use, subject to the upstream licence terms.

How do I run Vikasit Omni?

Because Vikasit Omni is open weight, you self-host it with any OpenAI-compatible inference server (such as vLLM or SGLang) loaded with the Qwen3-Omni-30B-A3B (Apache 2.0) weights, then call it with the OpenAI SDK by setting the base URL to your own endpoint.

What context window does Vikasit Omni support?

Vikasit Omni supports a 32K context window. It is a 35B total (3B active (Thinker) + 0.3B (Talker)) MoE Thinker–Talker, multi-codebook model — full specifications are listed in the table above.

License & attribution

Apache 2.0

Built on Qwen3-Omni-30B-A3B (Apache 2.0). Upstream copyright, license, and attribution notices are retained.