Chinese-LiPS Multimodal Speech Recognition Dataset
Date
17 days ago
Size
86.64 GB
Publish URL
Categories
Chinese-LiPS is a multimodal speech recognition dataset released by Zhiyuan Research Institute and Nankai University in 2025. The related paper result is: "Chinese-LiPS: A Chinese audio-visual speech recognition dataset with Lip-reading and Presentation Slides". As the first Chinese multimodal speech recognition dataset that combines "lip reading information + slide semantic information", Chinese-LiPS is aimed at complex contexts such as Chinese explanation, popular science, teaching, and knowledge dissemination, and is committed to promoting the development of Chinese multimodal speech recognition technology.
Dataset features:
- Large data size:Chinese-LiPS has a total length of about 100 hours and contains 36,208 high-quality voice clips recorded by 207 professional speakers, with good representativeness and diversity.
- Covering a wide range of topics: The content covers 9 popular fields including science and technology, health and wellness, culture and history, tourism and exploration, automobile industry, sports events, etc. The themes are evenly distributed, fully reflecting the expression characteristics and terminology density in the context of real teaching and explanation.
- High-quality slideshow production:Domain experts design the content and participate in annotation to ensure the accuracy and professionalism of the slide text and image information. The PPT content is clearly structured and beautifully designed, containing rich images and visual semantic information, rather than just a pile of text.
- High-quality video recording:The video is recorded by a professional speaker in a quiet environment with high-definition images, covering two modes: lip-reading video (720P) and slide video (1080P), ensuring precise alignment of speech and lip movements, and ensuring consistent and reliable data quality.

Chinese-LiPS.torrent
Seeding 1Downloading 0Completed 4Total Downloads 15