Date

3 years ago

Organization

Publish URL

vipl.ict.ac.cn

Paper URL

arxiv.org

License

Non-Commercial

Tags

Image Recognition

Multimodal

Natural Language Processing

Audio Recognition

CAS-VSR-W1k, formerly known as LRW-1000, is the largest publicly available Mandarin lexical-level lip sync dataset. The dataset contains 1,000 word classes and 700,000 samples from more than 2,000 speakers. The dataset contains more than 1,000,000 Chinese character instances.

Each category corresponds to a syllable of a Mandarin word consisting of one or several Chinese characters. The dataset is designed to cover natural variations in different speech modes and imaging conditions to incorporate challenges encountered in real applications.

This dataset is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at [email protected] for prompt review and removal.