All models
Vision-LanguageOpen weights2B

Vikasit Vision 2B

Lightweight vision. Image captioning, OCR, visual Q&A on device.

Overview

Vikasit Vision 2B is a lightweight vision-language model for image captioning, OCR, and visual Q&A — efficient enough for on-device and edge use. Supports images and long video.

Specifications

Total parameters
2B
Architecture
Dense ViT + LLM, interleaved-MRoPE, DeepStack multi-level features
Context window
256K native, ~1M expandable
Modalities
Text + image + video in → text out
License
Apache 2.0

Capabilities

  • Image captioning and visual Q&A
  • OCR across 32 languages
  • Video understanding (up to thousands of frames)
  • Native-resolution dynamic input
OCR across 32 languages. (No Indic-script OCR claimed by the base model.)

Benchmarks

BenchmarkScore
MMMU (val)61.4
DocVQA (test)92.9
ChartQA (test)86.6
MathVista (mini)73.6
AI2D80.4
OCRBench792
RealWorldQA69.5
Video-MME62.1
TextVQAN/A

Numbers from the Qwen3-VL Technical Report (arXiv:2511.21631, Table 4), thinking-mode column. TextVQA is not reported for the Qwen3-VL series.

Hardware & deployment

PrecisionMemory
bf16~5 GB
INT4~2 GB

Quick start

Vikasit Vision 2B is an open-weight model. Self-host it with any OpenAI-compatible inference server and call it with the OpenAI SDK as shown below.

OpenAI-compatible Python (self-hosted, e.g. vLLM)
# pip install openai
import os
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="sk-local",  # self-hosted servers accept any token
)

resp = client.chat.completions.create(
    model="vikasit-vision-2b",
    messages=[
        {"role": "user", "content": "Explain Vikasit Vision 2B in one sentence."}
    ],
)

print(resp.choices[0].message.content)

Limitations

  • No Indic-script OCR (base supports 32 listed languages)
  • Smaller VL model — complex reasoning below 8B

Vikasit Vision 2B FAQ

How much does Vikasit Vision 2B cost?

Vikasit Vision 2B is an open-weight model built on Qwen3-VL-2B (Apache 2.0). Self-hosting the weights is free under the Apache 2.0 licence — you pay only for the hardware or cloud GPUs you run it on. Typical deployment fits the memory profiles listed in the hardware section above.

Is Vikasit Vision 2B open weight?

Yes. Vikasit Vision 2B is built on Qwen3-VL-2B (Apache 2.0) and distributed under the Apache 2.0 licence, so the weights are openly available for self-hosting, fine-tuning, and commercial use, subject to the upstream licence terms.

How do I run Vikasit Vision 2B?

Because Vikasit Vision 2B is open weight, you self-host it with any OpenAI-compatible inference server (such as vLLM or SGLang) loaded with the Qwen3-VL-2B (Apache 2.0) weights, then call it with the OpenAI SDK by setting the base URL to your own endpoint.

What context window does Vikasit Vision 2B support?

Vikasit Vision 2B supports a 256K native, ~1M expandable context window. It is a 2B Dense ViT + LLM, interleaved-MRoPE, DeepStack multi-level features model — full specifications are listed in the table above.

License & attribution

Apache 2.0

Built on Qwen3-VL-2B (Apache 2.0). Upstream copyright, license, and attribution notices are retained.