Vikasit 35B MoE
Latest MoE with architecture improvements. Best efficiency/quality ratio.
Overview
Vikasit 35B MoE is the latest-generation sparse model — 35B total with only 3B active per token, 256 experts, and the best efficiency-to-quality ratio in the mid tier.
Specifications
- Total parameters
- 35B total
- Active parameters
- 3B active
- Architecture
- Mixture-of-Experts (hybrid Gated DeltaNet + Gated Attention)
- Experts
- 256 total / 8 routed + 1 shared
- Layers
- 40
- Context window
- 262K native, ~1M via YaRN
- Modalities
- Text in → text out (multimodal-capable base)
- License
- Apache 2.0
Capabilities
- 35B-class quality at ~3B compute cost
- 256-expert fine-grained routing
- 262K native context, ~1M via YaRN
- Strong reasoning and coding
Benchmarks
| Benchmark | Score |
|---|---|
| MMLU-Pro | 85.2 |
| GPQA-Diamond | 86.0 |
| LiveCodeBench v6 | 80.4 |
| SWE-bench Verified | 73.4 |
| AIME 2026 | 92.7 |
| MATH-500 | N/A |
| IFEval | N/A |
Numbers from the Qwen3.6-35B-A3B HuggingFace model card. AIME 2026 reported instead of 2025.
Hardware & deployment
| Precision | Memory |
|---|---|
| bf16 | ~70 GB |
| INT4 | ~20 GB |
Quick start
Vikasit 35B MoE is an open-weight model. Self-host it with any OpenAI-compatible inference server and call it with the OpenAI SDK as shown below.
# pip install openai
import os
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="sk-local", # self-hosted servers accept any token
)
resp = client.chat.completions.create(
model="vikasit-35b-moe",
messages=[
{"role": "user", "content": "Explain Vikasit 35B MoE in one sentence."}
],
)
print(resp.choices[0].message.content)Limitations
- Full parameter set must be in memory (MoE)
- IFEval/MATH-500 not on official card
Vikasit 35B MoE FAQ
How much does Vikasit 35B MoE cost?
Vikasit 35B MoE is an open-weight model built on Qwen3.6-35B-A3B (Apache 2.0). Self-hosting the weights is free under the Apache 2.0 licence — you pay only for the hardware or cloud GPUs you run it on. Typical deployment fits the memory profiles listed in the hardware section above.
Is Vikasit 35B MoE open weight?
Yes. Vikasit 35B MoE is built on Qwen3.6-35B-A3B (Apache 2.0) and distributed under the Apache 2.0 licence, so the weights are openly available for self-hosting, fine-tuning, and commercial use, subject to the upstream licence terms.
How do I run Vikasit 35B MoE?
Because Vikasit 35B MoE is open weight, you self-host it with any OpenAI-compatible inference server (such as vLLM or SGLang) loaded with the Qwen3.6-35B-A3B (Apache 2.0) weights, then call it with the OpenAI SDK by setting the base URL to your own endpoint.
What context window does Vikasit 35B MoE support?
Vikasit 35B MoE supports a 262K native, ~1M via YaRN context window. It is a 35B total (3B active) Mixture-of-Experts (hybrid Gated DeltaNet + Gated Attention) model — full specifications are listed in the table above.
License & attribution
Apache 2.0
Built on Qwen3.6-35B-A3B (Apache 2.0). Upstream copyright, license, and attribution notices are retained.