Vikasit Voice HD
High-quality TTS. Voice cloning, expressive speech synthesis.
Overview
Vikasit Voice HD is the top-quality text-to-speech model — the 1.7B variant with the best naturalness and speaker similarity, voice cloning, and ~101 ms first-packet streaming latency.
Specifications
- Total parameters
- 1.7B
- Architecture
- Multi-codebook TTS, 12Hz tokenizer
- Context window
- —
- Modalities
- Text in → speech out
- License
- Apache 2.0
Capabilities
- Highest-quality expressive synthesis
- Voice cloning and custom voices
- Long-form speech (10+ min) with low WER
- Streaming with ~101 ms first-packet latency
Benchmarks
| Benchmark | Score |
|---|---|
| Avg WER (10 langs) | 1.84% |
| Long-speech WER zh / en | 1.52 / 1.23 |
| Speaker similarity (SIM) | 0.79 |
| Cross-lingual clone (zh→ko MixER) | 4.82% |
| First-packet latency | ~101 ms |
| MOS / CMOS | N/A |
Numbers from the Qwen3-TTS Technical Report (arXiv:2601.15621), 12Hz-1.7B variant.
Hardware & deployment
| Precision | Memory |
|---|---|
| bf16 | ~3.4 GB |
Quick start
Vikasit Voice HD is an open-weight model. Self-host it with any OpenAI-compatible inference server and call it with the OpenAI SDK as shown below.
# pip install openai
import os
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="sk-local", # self-hosted servers accept any token
)
resp = client.chat.completions.create(
model="vikasit-voice-hd",
messages=[
{"role": "user", "content": "Explain Vikasit Voice HD in one sentence."}
],
)
print(resp.choices[0].message.content)Limitations
- No Indic languages in the base (10 languages)
Vikasit Voice HD FAQ
How much does Vikasit Voice HD cost?
Vikasit Voice HD is an open-weight model built on Qwen3-TTS (12Hz-1.7B, Apache 2.0). Self-hosting the weights is free under the Apache 2.0 licence — you pay only for the hardware or cloud GPUs you run it on. Typical deployment fits the memory profiles listed in the hardware section above.
Is Vikasit Voice HD open weight?
Yes. Vikasit Voice HD is built on Qwen3-TTS (12Hz-1.7B, Apache 2.0) and distributed under the Apache 2.0 licence, so the weights are openly available for self-hosting, fine-tuning, and commercial use, subject to the upstream licence terms.
How do I run Vikasit Voice HD?
Because Vikasit Voice HD is open weight, you self-host it with any OpenAI-compatible inference server (such as vLLM or SGLang) loaded with the Qwen3-TTS (12Hz-1.7B, Apache 2.0) weights, then call it with the OpenAI SDK by setting the base URL to your own endpoint.
What context window does Vikasit Voice HD support?
Vikasit Voice HD supports a — context window. It is a 1.7B Multi-codebook TTS, 12Hz tokenizer model — full specifications are listed in the table above.
License & attribution
Apache 2.0
Built on Qwen3-TTS (12Hz-1.7B, Apache 2.0). Upstream copyright, license, and attribution notices are retained.