Build guide

How to build a structured data extraction pipeline with Vikasit AI

Data extraction turns messy text — emails, invoices, resumes — into clean structured JSON. With JSON-mode prompting you can reliably pull typed fields out of unstructured content at scale.

Recommended model

Vikasit 3

Strong, consistent formatting makes it reliable for structured JSON output at a low per-document cost. For huge batches of simple docs, Vikasit 3 Flash trims cost further.

Steps

1
Define the target schema (field names, types, and which are required).
2
Write a system prompt that asks for valid JSON matching the schema and nothing else.
3
Pass the raw text and request the structured output.
4
Parse the JSON and validate it against your schema (e.g. with pydantic).
5
Retry or flag records that fail validation.
6
Batch documents and run concurrently to maximize throughput.

Code

The Vikasit Inference API is OpenAI-compatible, so this uses the standard OpenAI Python SDK pointed at https://api.vikasit.ai/v1.

data-extraction.py

import json
from openai import OpenAI

client = OpenAI(
    base_url="https://api.vikasit.ai/v1",
    api_key="sk-vikasit-...",  # get one at vikasit.ai/auth
)

def extract(text: str) -> dict:
    resp = client.chat.completions.create(
        model="vikasit-3",
        messages=[
            {
                "role": "system",
                "content": (
                    "Extract fields as JSON: {name, email, company}. "
                    "Return ONLY valid JSON."
                ),
            },
            {"role": "user", "content": text},
        ],
        response_format={"type": "json_object"},
    )
    return json.loads(resp.choices[0].message.content)

Build your structured data extraction pipeline today

Get an API key and 2M free tokens a day on Vikasit Nova. Pay-as-you-go, no minimums, OpenAI-compatible.

Get an API key See all models & pricing