HyperAIHyperAI

Command Palette

Search for a command to run...

zh-meme-sft-8k Chinese Internet Meme Culture Dataset

Date

4 hours ago

License

MIT

Tags

zh-meme-sft-8k is a Chinese internet meme culture instruction fine-tuning dataset, primarily used to train dialogue models to understand and use trending internet memes. The dataset is constructed from comment interactions on social media platforms such as Douyin, Xiaohongshu, and Bilibili, and has undergone multiple rounds of cleaning and enhancement. Its features include authentic dialogue structures, high-quality retention of trending memes after multiple rounds of cleaning, and standardization using the ChatML format.

Dataset composition:

  • Training set: 7,377 samples, accounting for 851 TP3T
  • Validation set: 868 samples, accounting for 101 TP3T
  • Test set: 435 samples, accounting for 51% of TP3T

Dialogue hierarchy distribution:

  • Level 1 conversation (posts - comments): Approximately 401 TP 3T
  • Level 2 dialogue (comments-replies): Approximately 601 TP3T

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp