Date

2 years ago

Size

104.46 GB

Organization

Publish URL

Data Collection Process

From October 2020 to August 2021, the research team collected approximately 10 billion pairs of alternative text and image sources in HTML documents in CommonCrawl, and eliminated uninformative pairs with minimal cost through a filtering process at the image and text levels. The figure outlines the research team's data collection process.

coyo-700m.torrent

Seeding 1Downloading 0Completed 171Total Downloads 378

coyo-700m/
- README.md
  1.32 KB
- README.txt
  2.63 KB

This dataset is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at [email protected] for prompt review and removal.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Use this Dataset

Discuss on Discord

Date

2 years ago

Size

104.46 GB

Organization

Publish URL

github.com

Data Collection Process

coyo-700m.torrent

Seeding 1Downloading 0Completed 171Total Downloads 378

coyo-700m/
- README.md
  1.32 KB
- README.txt
  2.63 KB

Related Datasets

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Use this Dataset

Discuss on Discord

Date

2 years ago

Size

104.46 GB

Organization

Publish URL

github.com

Data Collection Process

coyo-700m.torrent

Seeding 1Downloading 0Completed 171Total Downloads 378

coyo-700m/
- README.md
  1.32 KB
- README.txt
  2.63 KB

Related Datasets

Lung Cancer Clinical Dataset

2 months ago

LightOnOCR-mix-0126 Text Transcription Dataset

5 months ago

Delhi Pollution AQI Dataset

5 months ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

COYO-700M image-text Pair Dataset

Data Collection Process

Build AI with AI

HyperAI Newsletters

Command Palette

COYO-700M image-text Pair Dataset

Data Collection Process

Related Datasets

Lung Cancer Clinical Dataset

LightOnOCR-mix-0126 Text Transcription Dataset

Delhi Pollution AQI Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

COYO-700M image-text Pair Dataset

Data Collection Process

Related Datasets

Lung Cancer Clinical Dataset

LightOnOCR-mix-0126 Text Transcription Dataset

Delhi Pollution AQI Dataset

Build AI with AI

HyperAI Newsletters

Related Datasets

Lung Cancer Clinical Dataset

LightOnOCR-mix-0126 Text Transcription Dataset

Delhi Pollution AQI Dataset

Related Datasets

Lung Cancer Clinical Dataset

LightOnOCR-mix-0126 Text Transcription Dataset

Delhi Pollution AQI Dataset