Command Palette
Search for a command to run...
zh-meme-sft-8k Chinese Internet Meme Culture Dataset
zh-meme-sft-8k is a Chinese internet meme culture instruction fine-tuning dataset, primarily used to train dialogue models to understand and use trending internet memes. The dataset is constructed from comment interactions on social media platforms such as Douyin, Xiaohongshu, and Bilibili, and has undergone multiple rounds of cleaning and enhancement. Its features include authentic dialogue structures, high-quality retention of trending memes after multiple rounds of cleaning, and standardization using the ChatML format.
Dataset composition:
- Training set: 7,377 samples, accounting for 851 TP3T
- Validation set: 868 samples, accounting for 101 TP3T
- Test set: 435 samples, accounting for 51% of TP3T
Dialogue hierarchy distribution:
- Level 1 conversation (posts - comments): Approximately 401 TP 3T
- Level 2 dialogue (comments-replies): Approximately 601 TP3T
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.