Technology

The Technology Behind Vikasit

Frontier AI capabilities meets Indian language mastery.

Our Approach: Continual Pre-Training

Instead of training from scratch, we build on Qwen3's world-class foundation and add deep Indian language expertise.

Qwen3-235B

World-class coding, math, reasoning (Apache 2.0)

+ 2T Indic Tokens

AIKosh + curated Indian language corpus across 22+ languages

Vikasit

Global intelligence + Indian language fluency

This approach is 7-10x more cost-efficient than training from scratch while delivering a model that excels at both global tasks and Indian languages.

Model Family

Three sizes for every deployment scenario.

Vikasit-8B

Edge & Mobile

8B parameters

Runs on phones and edge devices. Optimized for low-latency inference with full 22-language support.

Edge deployment22 languagesOn-device inference

Vikasit-32B

Enterprise

32B dense

Single GPU deployment for production workloads. Dense architecture delivers consistent performance for enterprise applications.

Single GPUProduction-readyDense architecture

Vikasit-235B

Frontier

235B MoE (22B active)

Flagship model with Mixture-of-Experts architecture. Frontier-level coding, reasoning, and Indian language mastery.

Multi-GPUFrontier qualityMoE efficiency

Infrastructure Stack

Production-grade infrastructure for sovereign AI.

vLLM

High-throughput inference engine with continuous batching and PagedAttention

Kong Gateway

API authentication, rate limiting, and intelligent model routing

OpenAI-Compatible API

Drop-in replacement — existing tools and SDKs just work

Kubernetes (K3s)

Container orchestration with GPU-aware scheduling and auto-scaling

Prometheus + Grafana

Real-time monitoring of GPU utilization, latency, and throughput

AIKosh Integration

5,500+ government datasets across 20 sectors for training data

Custom Indic Tokenizer

Existing multilingual tokenizers require 4-8 tokens per Indic word vs. 1.4 for English — making Indian language inference 3-5x more expensive. Our custom tokenizer achieves 1.5-2.2 tokens per word across all 22 Indian languages, cutting inference costs by 2-3x.

5.2

GPT-4 Hindi tokens/word

1.8

Sarvam-1 Hindi tokens/word

1.5

Vikasit target tokens/word

Open Source Commitment

Everything we build is open-source under Apache 2.0 — models, tokenizer, benchmarks, training code, and data processing tools. We're building India's AI commons.