Date

2 years ago

Size

5.59 MB

Organization

Dataset Introduction

The DPO-zh-en-emoji dataset is a dataset specially designed for fine-tuning large language models launched by shareAI in 2024, where "DPO" stands for Direct Preference Optimization. This dataset contains a large amount of question-answer pair data. Each question has two versions of the answer, Chinese and English, and the answers are integrated with fun and humorous elements, including the use of emojis. The research team carefully selected some questions from Zhihu, logical reasoning, and idiots as queries, and used the llama3 70b instruct model to sample and generate a Chinese version of the answer and an English version of the answer for each query. Such a design helps to activate the language style preferences of multilingual chat models and improve the quality of model-generated content and its compliance with human preferences.

DPO-zh-en-emoji.torrent

Seeding 1Downloading 0Completed 155Total Downloads 402

DPO-zh-en-emoji/
- README.md
  1.58 KB
- README.txt
  3.16 KB

This dataset is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at [email protected] for prompt review and removal.

Related Datasets

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Use this Dataset

Discuss on Discord

Date

2 years ago

Size

5.59 MB

Organization

Dataset Introduction

DPO-zh-en-emoji.torrent

Seeding 1Downloading 0Completed 155Total Downloads 402

DPO-zh-en-emoji/
- README.md
  1.58 KB
- README.txt
  3.16 KB

Related Datasets

CHOCLO Latin American Cultural Benchmark Dataset

2 months ago

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

2 months ago

Open-RL Inference Problem Dataset

3 months ago

GroundingME Complex Scene Understanding Evaluation Dataset

5 months ago

MCIF Multimodal Cross-Language Instruction Following Dataset

5 months ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

DPO-zh-en-emoji Emoji Question Answering Dataset

Dataset Introduction

Build AI with AI

HyperAI Newsletters

Command Palette

DPO-zh-en-emoji Emoji Question Answering Dataset

Dataset Introduction

Related Datasets

CHOCLO Latin American Cultural Benchmark Dataset

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

Open-RL Inference Problem Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

DPO-zh-en-emoji Emoji Question Answering Dataset

Dataset Introduction

Related Datasets

CHOCLO Latin American Cultural Benchmark Dataset

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

Open-RL Inference Problem Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset

Build AI with AI

HyperAI Newsletters

Related Datasets

CHOCLO Latin American Cultural Benchmark Dataset

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

Open-RL Inference Problem Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset

Related Datasets

CHOCLO Latin American Cultural Benchmark Dataset

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

Open-RL Inference Problem Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset