All models
Text / ReasoningAvailable via API9B

Vikasit 3 Flash

Best model under 10B. Fast inference, frontier-adjacent quality.

Overview

Vikasit 3 Flash is the flagship sub-10B model — a hybrid-attention MoE delivering frontier-adjacent quality at high speed. Reported to beat much larger models on MMLU-Pro and GPQA. Served live via the Vikasit API.

Specifications

Total parameters
9B
Architecture
Hybrid MoE (Gated DeltaNet + Gated Attention + sparse MoE)
Attention
Gated DeltaNet (linear) + Gated Attention (16 query / 4 KV heads)
Context window
262K native, ~1M via YaRN
Modalities
Text in → text out (multimodal-capable base)
License
Apache 2.0

Capabilities

  • Best-in-class quality under 10B
  • 262K native context, ~1M via YaRN
  • Fast, low-cost inference
  • Strong reasoning and coding
Multilingual. Strong English + major Indian languages.

Benchmarks

BenchmarkScore
MMLU-Pro82.5
GPQA-Diamond81.7
LiveCodeBench v665.6
IFEval91.5
HMMT Feb 202583.2
MATH-500N/A
SWE-bench VerifiedN/A

Numbers from the Qwen3.5-9B HuggingFace model card. Qwen3.5 generation reports HMMT/LiveCodeBench v6 and no longer publishes MATH-500/HumanEval.

Hardware & deployment

PrecisionMemory
bf16~18 GB
INT4~5.5 GB

Quick start

Call Vikasit 3 Flash through the OpenAI-compatible Vikasit AI API at https://api.vikasit.ai/v1 using the model id vikasit-3-flash.

OpenAI-compatible Python (Vikasit AI API)
# pip install openai
import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.vikasit.ai/v1",
    api_key=os.environ["VIKASIT_API_KEY"],
)

resp = client.chat.completions.create(
    model="vikasit-3-flash",
    messages=[
        {"role": "user", "content": "Explain Vikasit 3 Flash in one sentence."}
    ],
)

print(resp.choices[0].message.content)
# or with curl
curl https://api.vikasit.ai/v1/chat/completions \
  -H "Authorization: Bearer $VIKASIT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vikasit-3-flash",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Limitations

  • Hybrid-attention kernels need recent inference runtimes
  • Some classic benchmarks (MATH-500/HumanEval) not published by the base

Vikasit 3 Flash FAQ

How much does Vikasit 3 Flash cost?

Vikasit 3 Flash is served through the Vikasit AI API on usage-based, pay-as-you-go pricing billed per million input and output tokens — see the Vikasit AI pricing page for current rates. Because it is built on the open-weight Qwen3.5-9B (Apache 2.0), you can also self-host the weights for free under the Apache 2.0 licence and pay only for your own compute.

Is Vikasit 3 Flash open weight?

Yes. Vikasit 3 Flash is built on Qwen3.5-9B (Apache 2.0) and distributed under the Apache 2.0 licence, so the weights are openly available for self-hosting, fine-tuning, and commercial use, subject to the upstream licence terms.

How do I use Vikasit 3 Flash with the OpenAI SDK?

The Vikasit AI API is OpenAI-compatible. Point the OpenAI client's base URL at https://api.vikasit.ai/v1, set your Vikasit API key, and pass "vikasit-3-flash" as the model. The quick-start snippet above shows the exact Python call.

What context window does Vikasit 3 Flash support?

Vikasit 3 Flash supports a 262K native, ~1M via YaRN context window. It is a 9B Hybrid MoE (Gated DeltaNet + Gated Attention + sparse MoE) model — full specifications are listed in the table above.

License & attribution

Apache 2.0

Built on Qwen3.5-9B (Apache 2.0). Upstream copyright, license, and attribution notices are retained.