HyperAI

Updesh Indic Synthetic Text Dataset

Date

9 days ago

Size

16.09 GB

Organization

Microsoft

Publish URL

huggingface.co

Categories

Updesh is an Indian language synthetic text dataset released by Microsoft in 2025 to facilitate post-training of Large Language Models (LLMs) for Indian languages.

The dataset contains 6,800,000 inference data and 2,100,000 generated data in the following languages: Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Nepali, Odia, Punjabi, Tamil, Telugu, and Urdu.

Updesh_beta.torrent
Seeding 1Downloading 0Completed 4Total Downloads 5
  • Updesh_beta/
    • README.md
      1.2 KB
    • README.txt
      2.4 KB
      • data/
        • Updesh_beta.zip
          16.09 GB