HyperAIHyperAI

Command Palette

Search for a command to run...

MCD Multimodal Code Generation Dataset

Multimodal Coding Dataset (MCD) is a large-scale dataset proposed by Microsoft Research, Peking University and Southern University of Science and Technology and released in 2025. The related paper results are "VisCodex: Unified Multimodal Code Generation via Merging Vision and Coding Models".

The dataset contains a total of approximately 598,000 high-quality samples/pairs, organized in an instruction-following format, covering multiple input modalities (text, images, code) and output modalities (code, answers, explanations), and is suitable for multimodal code understanding and generation tasks.

The data includes:

  • Enhanced HTML code (HTML): about 200,000 code-screenshot pairs, focusing on visual effects and structural optimization.
  • Chart: About 210,000 image-code pairs for image-to-code reproduction.
  • Question and Answer (QA): About 59,000 code-question-answer pairs, with questions and answers centered around code.
  • Algorithm: Approximately 129,000 algorithm coding problems and instruction-following samples.

MCD.torrent
Seeding 1Downloading 0Completed 8Total Downloads 38
  • MCD/
    • README.md
      1.75 KB
    • README.txt
      3.5 KB
      • data/
        • MCD.zip
          18 GB

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp