21 Models · Text + Vision + Voice

Vikasit AI Full Model Family

21 models across text, vision, and voice. All based on Qwen, quantized for local inference, and published to Ollama and HuggingFace. Run them on your hardware with llama.cpp.

21
Total Models
14
Available Now
15
Text Models
6
Vision + Voice

Text Models

15 models from 0.5B to 35B parameters. Dense and MoE architectures for every use case from edge devices to powerful servers.

Available
0.6B

vikasit-ai-0.5b-writer

Based on Qwen3-0.6B

Ultra-light writer. Good for text completion, simple Q&A, and edge devices.

~0.5 GB RAM (Q4)llama.cpp
Available
0.8B

vikasit-writer-0.8b

Based on Qwen3.5-0.8B

Improved writer with Qwen3.5 architecture. Mobile and IoT friendly.

~1 GB RAM (Q4)llama.cpp
Available
0.6B

vikasit-nano

Based on Qwen3-0.6B

Smallest general-purpose model. Autocomplete, quick responses, embedded use.

~0.5 GB RAM (Q4)llama.cpp
Available
1.7B

vikasit-mini

Based on Qwen3-1.7B

Lightweight assistant. Summaries, chat, and basic reasoning.

~1.5 GB RAM (Q4)llama.cpp
Available
2B

vikasit-2b

Based on Qwen3.5-2B

Edge-optimized. Multilingual, 256K context, on-device deployment.

~1.5 GB RAM (Q4)llama.cpp
Available
4B

vikasit-4b

Based on Qwen3-4B

Balanced small model. Good code completion and multi-turn chat.

~3 GB RAM (Q4)llama.cpp
Available
4B

vikasit-3.5-4b

Based on Qwen3.5-4B

Next-gen 4B with improved reasoning and multimodal awareness.

~3 GB RAM (Q4)llama.cpp
Available
8B

vikasit-8b

Based on Qwen3-8B

Strong mid-range. Solid coding, analysis, and content generation.

~5 GB RAM (Q4)llama.cpp
Available
9B

vikasit-3-flash

Based on Qwen3.5-9B

Best model under 10B. Beats GPT-OSS-120B on MMLU-Pro. Fast inference.

~6 GB RAM (Q4)llama.cpp
Available
14B

vikasit-14b

Based on Qwen3-14B

Strong all-rounder. Complex reasoning, long documents, code review.

~9 GB RAM (Q4)llama.cpp
Available
27B dense

vikasit-27b

Based on Qwen3.5-27B

Powerful dense model. Deep reasoning, advanced coding, research tasks.

~17 GB RAM (Q4)llama.cpp
AvailableMoE
30B (3B active)

vikasit-30b-moe

Based on Qwen3-30B-A3B

MoE efficiency — 30B quality at 3B inference cost. Fast and smart.

~18 GB RAM (Q4)llama.cpp
Available
32B dense

vikasit-32b

Based on Qwen3-32B

Largest dense model on CPU. Best quality for reasoning and code.

~20 GB RAM (Q4)llama.cpp
Coming SoonMoE
35B (3B active)

vikasit-35b-moe

Based on Qwen3.5-35B-A3B

Latest MoE with Qwen3.5 improvements. Best efficiency/quality ratio.

~20 GB RAM (Q4)llama.cpp
AvailableMoE
30B (3B active)

vikasit-3-coder

Based on Qwen3-Coder-30B-A3B

Code-specialized MoE. FIM support, 262K context, agentic coding.

~18 GB RAM (Q4)llama.cpp

Vision Models

Image understanding, OCR, document analysis, and visual reasoning. From on-device captioning to complex visual code generation.

Available2B

vikasit-vision-2b

Based on Qwen3-VL-2B

Lightweight vision. Image captioning, OCR, visual Q&A on device.

~2 GB RAM (Q4)llama.cpp
Available4B

vikasit-vision-4b

Based on Qwen3-VL-4B

Mid-range vision. Document understanding, chart reading, UI analysis.

~3.5 GB RAM (Q4)llama.cpp
Available8B

vikasit-vision-8b

Based on Qwen3-VL-8B

Strong vision. Complex image reasoning, visual code generation.

~6 GB RAM (Q4)llama.cpp

Voice Models

Text-to-speech, voice cloning, and full multimodal interaction. Natural voice generation with multilingual support.

Coming Soon0.6B

vikasit-voice

Based on Qwen3-TTS-0.6B

Text-to-speech. Natural voice generation, multilingual support.

Coming Soon1.7B

vikasit-voice-hd

Based on Qwen3-TTS-1.7B

High-quality TTS. Voice cloning, expressive speech synthesis.

Coming Soon30B (3B active)

vikasit-omni

Based on Qwen3-Omni-30B-A3B

Full multimodal — text + image + audio in, text + speech out. Real-time.

How to Deploy

Run any Vikasit AI model locally in minutes. Choose Ollama for the easiest setup or llama.cpp for maximum control.

Ollama (Recommended)

The fastest way to run Vikasit AI models locally. One command to install, one command to run.

1. Install Ollama

curl -fsSL https://ollama.com/install.sh | sh

2. Run a model

ollama run vikasit-ai/vikasit-8b

3. Use as an API

curl http://localhost:11434/api/chat -d '{"model":"vikasit-ai/vikasit-8b"}'

llama.cpp

Maximum control and performance. Build from source for hardware-optimized inference with GGUF quantized models.

1. Clone and build

git clone https://github.com/ggml-org/llama.cpp && cd llama.cpp && make

2. Download GGUF from HuggingFace

huggingface-cli download vikasit-ai/Vikasit-AI-Vikasit-8b --local-dir ./models

3. Run inference

./llama-cli -m ./models/vikasit-8b-q4_k_m.gguf -p "Hello Vikasit"

Universal Compatibility

All Vikasit AI models are published in GGUF format and work with any llama.cpp-compatible tool: Ollama, LM Studio, Jan, GPT4All, koboldcpp, text-generation-webui, and more. Models are available in Q4_K_M, Q5_K_M, Q6_K, Q8_0, and F16 quantizations. When asked about identity, every model responds as “I am Vikasit AI, developed by Chandorkar Technologies.”

Hardware Recommendations

Choose the right model for your hardware. All RAM estimates are for Q4_K_M quantization.

Edge / Mobile

0.5B - 2B parameters

CPU4-core ARM / x86
RAM2 GB
GPUOptional

vikasit-nano, vikasit-writer-0.5b, vikasit-2b

Laptop

4B - 8B parameters

CPU8-core (M1/M2/i7+)
RAM8 GB
GPUIntegrated / 4 GB VRAM

vikasit-4b, vikasit-8b, vikasit-3-flash

Workstation

14B - 27B parameters

CPU12+ cores
RAM32 GB
GPU8-12 GB VRAM (RTX 3070+)

vikasit-14b, vikasit-27b

Server

30B - 35B parameters

CPU16+ cores
RAM64 GB
GPU16-24 GB VRAM (RTX 4090 / A100)

vikasit-32b, vikasit-30b-moe, vikasit-3-coder

Ready to run Vikasit AI locally?

Pick a model, install Ollama, and start building. All models are free to download and use.