All models
Text / ReasoningOpen weights30.5B total · 3.3B active

Vikasit 30B MoE

MoE efficiency — 30B quality at 3B inference cost. Fast and smart.

Overview

Vikasit 30B MoE delivers 30B-class quality while activating only ~3.3B parameters per token — fast inference with high quality. Ideal when you want strong reasoning without dense-model latency.

Specifications

Total parameters
30.5B total
Active parameters
3.3B active
Architecture
Mixture-of-Experts
Experts
128 total / 8 activated per token, fine-grained, no shared expert
Layers
48
Attention
GQA (32 query / 4 KV heads)
Context window
32K native, 131K via YaRN
Vocabulary
151,669
Modalities
Text in → text out
License
Apache 2.0

Capabilities

  • 30B-class quality at ~3B compute cost
  • Fast inference, high throughput
  • 131K extended context (YaRN)
  • Thinking and non-thinking modes
119 languages. Strong English + major Indian languages.

Benchmarks

BenchmarkScore
MMLU-Pro61.5
GPQA-Diamond65.8
AIME 202570.9
MATH-50098.0
LiveCodeBench v562.6
BFCL v369.1
IFEval86.5
HumanEvalN/A

Instruct numbers from the Qwen3 Technical Report; MMLU-Pro is the base-model figure. Thinking-mode scores shown.

Hardware & deployment

PrecisionMemory
bf16~61 GB
INT4~18 GB

Quick start

Vikasit 30B MoE is an open-weight model. Self-host it with any OpenAI-compatible inference server and call it with the OpenAI SDK as shown below.

OpenAI-compatible Python (self-hosted, e.g. vLLM)
# pip install openai
import os
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="sk-local",  # self-hosted servers accept any token
)

resp = client.chat.completions.create(
    model="vikasit-30b-moe",
    messages=[
        {"role": "user", "content": "Explain Vikasit 30B MoE in one sentence."}
    ],
)

print(resp.choices[0].message.content)

Limitations

  • Full parameter set must be in memory (MoE)
  • Routing overhead at very low batch sizes

Vikasit 30B MoE FAQ

How much does Vikasit 30B MoE cost?

Vikasit 30B MoE is an open-weight model built on Qwen3-30B-A3B (Apache 2.0). Self-hosting the weights is free under the Apache 2.0 licence — you pay only for the hardware or cloud GPUs you run it on. Typical deployment fits the memory profiles listed in the hardware section above.

Is Vikasit 30B MoE open weight?

Yes. Vikasit 30B MoE is built on Qwen3-30B-A3B (Apache 2.0) and distributed under the Apache 2.0 licence, so the weights are openly available for self-hosting, fine-tuning, and commercial use, subject to the upstream licence terms.

How do I run Vikasit 30B MoE?

Because Vikasit 30B MoE is open weight, you self-host it with any OpenAI-compatible inference server (such as vLLM or SGLang) loaded with the Qwen3-30B-A3B (Apache 2.0) weights, then call it with the OpenAI SDK by setting the base URL to your own endpoint.

What context window does Vikasit 30B MoE support?

Vikasit 30B MoE supports a 32K native, 131K via YaRN context window. It is a 30.5B total (3.3B active) Mixture-of-Experts model — full specifications are listed in the table above.

License & attribution

Apache 2.0

Built on Qwen3-30B-A3B (Apache 2.0). Upstream copyright, license, and attribution notices are retained.