Date

2 years ago

Size

34.77 GB

Organization

Publish URL

source.plus

Paper URL

arxiv.org

Tags

Image Classification

Public Domain 12M (PD12M for short) is a large-scale image-text dataset created by Spawning in 2024. It contains 12.4 million high-quality public domain and CC0 licensed images with synthetic captions, which are mainly used to train text-to-image models. PD12M is currently the largest public domain image-text dataset. With its large scale and clear copyright statement, it provides a solid foundation for the training of AI models while minimizing copyright concerns. The related paper results are "Public Domain 12M: A Highly Aesthetic Image-Text Dataset with Novel Governance Mechanisms". PD12M's data sources include galleries, libraries, archives, museums (GLAM) and Wikimedia Commons, etc. Through careful screening and governance, the quality and security of the data are ensured. The dataset construction process covers multiple steps from image collection, copyright verification, image download, content filtering to subtitle generation. PD12M also introduced a community-driven data governance mechanism through the Source.Plus platform to support the continuous improvement and maintenance of the dataset. In addition, PD12M has a wide range of applications, mainly used to train and evaluate text-to-image generation models, aiming to promote the development of computer vision and natural language processing. This dataset not only provides rich training resources for the AI field, but also provides a model for responsible AI practices and promotes the protection and use of public AI resources.

PD12M.torrent

Seeding 1Downloading 0Completed 174Total Downloads 267

PD12M/
- README.md
  2.02 KB
- README.txt
  4.05 KB

This dataset is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at [email protected] for prompt review and removal.

Related Datasets

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Use this Dataset

Discuss on Discord

Date

2 years ago

Size

34.77 GB

Organization

Publish URL

source.plus

Paper URL

arxiv.org

Related Datasets

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

2 months ago

Open-RL Inference Problem Dataset

4 months ago

Hand Gestures Labbled Gesture Car Game Dataset

5 months ago

Human Face Emotions Dataset

3 months ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

PD12M Large-Scale Image-Text Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

PD12M Large-Scale Image-Text Dataset

Related Datasets

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

Open-RL Inference Problem Dataset

Hand Gestures Labbled Gesture Car Game Dataset

Human Face Emotions Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

PD12M Large-Scale Image-Text Dataset

Related Datasets

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

Open-RL Inference Problem Dataset

Hand Gestures Labbled Gesture Car Game Dataset

Human Face Emotions Dataset

Build AI with AI

HyperAI Newsletters

Related Datasets

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

Open-RL Inference Problem Dataset

Hand Gestures Labbled Gesture Car Game Dataset

Human Face Emotions Dataset

Related Datasets

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

Open-RL Inference Problem Dataset

Hand Gestures Labbled Gesture Car Game Dataset

Human Face Emotions Dataset