Vikasit 3 Flash
Best model under 10B. Fast inference, frontier-adjacent quality.
Overview
Vikasit 3 Flash is the flagship sub-10B model — a hybrid-attention MoE delivering frontier-adjacent quality at high speed. Reported to beat much larger models on MMLU-Pro and GPQA. Served live via the Vikasit API.
Specifications
- Total parameters
- 9B
- Architecture
- Hybrid MoE (Gated DeltaNet + Gated Attention + sparse MoE)
- Attention
- Gated DeltaNet (linear) + Gated Attention (16 query / 4 KV heads)
- Context window
- 262K native, ~1M via YaRN
- Modalities
- Text in → text out (multimodal-capable base)
- License
- Apache 2.0
Capabilities
- Best-in-class quality under 10B
- 262K native context, ~1M via YaRN
- Fast, low-cost inference
- Strong reasoning and coding
Benchmarks
| Benchmark | Score |
|---|---|
| MMLU-Pro | 82.5 |
| GPQA-Diamond | 81.7 |
| LiveCodeBench v6 | 65.6 |
| IFEval | 91.5 |
| HMMT Feb 2025 | 83.2 |
| MATH-500 | N/A |
| SWE-bench Verified | N/A |
Numbers from the Qwen3.5-9B HuggingFace model card. Qwen3.5 generation reports HMMT/LiveCodeBench v6 and no longer publishes MATH-500/HumanEval.
Hardware & deployment
| Precision | Memory |
|---|---|
| bf16 | ~18 GB |
| INT4 | ~5.5 GB |
Quick start
Call Vikasit 3 Flash through the OpenAI-compatible Vikasit AI API at https://api.vikasit.ai/v1 using the model id vikasit-3-flash.
# pip install openai
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.vikasit.ai/v1",
api_key=os.environ["VIKASIT_API_KEY"],
)
resp = client.chat.completions.create(
model="vikasit-3-flash",
messages=[
{"role": "user", "content": "Explain Vikasit 3 Flash in one sentence."}
],
)
print(resp.choices[0].message.content)# or with curl
curl https://api.vikasit.ai/v1/chat/completions \
-H "Authorization: Bearer $VIKASIT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "vikasit-3-flash",
"messages": [{"role": "user", "content": "Hello"}]
}'Limitations
- Hybrid-attention kernels need recent inference runtimes
- Some classic benchmarks (MATH-500/HumanEval) not published by the base
Vikasit 3 Flash FAQ
How much does Vikasit 3 Flash cost?
Vikasit 3 Flash is served through the Vikasit AI API on usage-based, pay-as-you-go pricing billed per million input and output tokens — see the Vikasit AI pricing page for current rates. Because it is built on the open-weight Qwen3.5-9B (Apache 2.0), you can also self-host the weights for free under the Apache 2.0 licence and pay only for your own compute.
Is Vikasit 3 Flash open weight?
Yes. Vikasit 3 Flash is built on Qwen3.5-9B (Apache 2.0) and distributed under the Apache 2.0 licence, so the weights are openly available for self-hosting, fine-tuning, and commercial use, subject to the upstream licence terms.
How do I use Vikasit 3 Flash with the OpenAI SDK?
The Vikasit AI API is OpenAI-compatible. Point the OpenAI client's base URL at https://api.vikasit.ai/v1, set your Vikasit API key, and pass "vikasit-3-flash" as the model. The quick-start snippet above shows the exact Python call.
What context window does Vikasit 3 Flash support?
Vikasit 3 Flash supports a 262K native, ~1M via YaRN context window. It is a 9B Hybrid MoE (Gated DeltaNet + Gated Attention + sparse MoE) model — full specifications are listed in the table above.
License & attribution
Apache 2.0
Built on Qwen3.5-9B (Apache 2.0). Upstream copyright, license, and attribution notices are retained.