Updesh Indic Synthetic Text Dataset
Date
9 days ago
Size
16.09 GB
Publish URL
Categories
Updesh is an Indian language synthetic text dataset released by Microsoft in 2025 to facilitate post-training of Large Language Models (LLMs) for Indian languages.
The dataset contains 6,800,000 inference data and 2,100,000 generated data in the following languages: Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Nepali, Odia, Punjabi, Tamil, Telugu, and Urdu.
Updesh_beta.torrent
Seeding 1Downloading 0Completed 4Total Downloads 5