Sarvam Vision and Bulbul V3:

India’s Artificial Intelligence (AI) ambitions, Bengaluru-based startup Sarvam AI’s latest models Sarvam Vision and Bulbul V3 have reportedly outperformed Google Gemini and OpenAI’s ChatGPT on India-specific AI benchmarks, marking a significant step toward building sovereign AI ecosystems tailored to Indian needs.
Sarvam Vision:
- It is a 3 billion-parameter vision-language model capable of a range of visual understanding tasks, including image captioning, scene text recognition, chart interpretation, and complex table parsing.
- It focuses on digitizing physical Indian records—including manuscripts, financial tables, and historical texts.
- While traditional Optical Character Recognition (OCR) only extracts text, Sarvam Vision performs “Knowledge Extraction.”
- It understands the structure of a document, interpreting complex tables, charts, and reading orders (e.g., distinguishing between a caption and a headline).
- It is trained on datasets covering all 22 official Indian languages, making it capable of handling documents with mixed scripts (e.g., a government form in Hindi and English).
- Under olmOCR-Bench, which evaluates how accurately AI converts PDFs and complex document images into structured text, Sarvam Vision scored 84.3%, outperforming Google Gemini 3 Pro and DeepSeek OCR v2.
- On OmniDocBench v1.5, which tests document parsing across diverse real-world formats, it achieved 93.28% accuracy, demonstrating strong capability in handling complex layouts.
Bulbul V3:
- It is Sarvam’s upgraded text-to-speech (TTS) AI model designed to generate natural, region-sensitive speech across India’s diverse linguistic landscape.
- It supports over 35 professional-quality voices across 11 Indian languages, with plans to expand to all 22 Scheduled Languages.
- Bulbul V3 captures prosody (pauses, tone, and emphasis )for natural speech and is optimized for Indian accents and linguistic nuances.
- It handles code-switching, regional variations, abbreviations, and emotional tone, making it well-suited for India’s multilingual environment.
- It is part of India’s broader push for sovereign AI models under the Rs 10,300-crore India AI Mission.


