Vikasit Vision 4B
Mid-range vision. Document understanding, chart reading, UI analysis.
Overview
Vikasit Vision 4B is a mid-range vision-language model for document understanding, chart and table reading, and UI analysis — a strong balance of capability and efficiency.
Specifications
- Total parameters
- 4B
- Architecture
- Dense ViT + LLM, interleaved-MRoPE, DeepStack multi-level features
- Context window
- 256K native, ~1M expandable
- Modalities
- Text + image + video in → text out
- License
- Apache 2.0
Capabilities
- Document and form understanding
- Chart, table, and diagram reading
- UI / screenshot analysis
- Video understanding
Benchmarks
| Benchmark | Score |
|---|---|
| MMMU (val) | 70.8 |
| DocVQA (test) | 94.2 |
| ChartQA (test) | 88.8 |
| MathVista (mini) | 79.5 |
| AI2D | 84.9 |
| OCRBench | 808 |
| RealWorldQA | 73.2 |
| Video-MME | 68.9 |
| TextVQA | N/A |
Numbers from the Qwen3-VL Technical Report (arXiv:2511.21631, Table 4), thinking-mode column.
Hardware & deployment
| Precision | Memory |
|---|---|
| bf16 | ~9 GB |
| INT4 | ~3 GB |
Quick start
Vikasit Vision 4B is an open-weight model. Self-host it with any OpenAI-compatible inference server and call it with the OpenAI SDK as shown below.
# pip install openai
import os
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="sk-local", # self-hosted servers accept any token
)
resp = client.chat.completions.create(
model="vikasit-vision-4b",
messages=[
{"role": "user", "content": "Explain Vikasit Vision 4B in one sentence."}
],
)
print(resp.choices[0].message.content)Limitations
- No Indic-script OCR (base supports 32 listed languages)
Vikasit Vision 4B FAQ
How much does Vikasit Vision 4B cost?
Vikasit Vision 4B is an open-weight model built on Qwen3-VL-4B (Apache 2.0). Self-hosting the weights is free under the Apache 2.0 licence — you pay only for the hardware or cloud GPUs you run it on. Typical deployment fits the memory profiles listed in the hardware section above.
Is Vikasit Vision 4B open weight?
Yes. Vikasit Vision 4B is built on Qwen3-VL-4B (Apache 2.0) and distributed under the Apache 2.0 licence, so the weights are openly available for self-hosting, fine-tuning, and commercial use, subject to the upstream licence terms.
How do I run Vikasit Vision 4B?
Because Vikasit Vision 4B is open weight, you self-host it with any OpenAI-compatible inference server (such as vLLM or SGLang) loaded with the Qwen3-VL-4B (Apache 2.0) weights, then call it with the OpenAI SDK by setting the base URL to your own endpoint.
What context window does Vikasit Vision 4B support?
Vikasit Vision 4B supports a 256K native, ~1M expandable context window. It is a 4B Dense ViT + LLM, interleaved-MRoPE, DeepStack multi-level features model — full specifications are listed in the table above.
License & attribution
Apache 2.0
Built on Qwen3-VL-4B (Apache 2.0). Upstream copyright, license, and attribution notices are retained.