Date

4 years ago

Size

1.65 GB

Organization

Publish URL

www.robots.ox.ac.uk

Paper URL

arxiv.org

License

CC BY 4.0

Tags

Multimodal

Audio and Speech Processing

Object Recognition

Audio Recognition

Image Recognition

VoxCeleb2 is a large-scale speaker recognition dataset derived from open source media, consisting of one million corpora from more than 6,000 speakers. Since the dataset is collected in natural scenes, there is no lack of interference in the voice clips, such as laughter, conversation, channel effects, music, etc. The corpus in VoxCeleb2 is multilingual, with speakers from 145 countries, covering a wide range of accents, ages, races, and languages. The dataset includes both audio and video, and is also suitable for solving problems such as visual speech synthesis, speech separation, face-voice cross-modal conversion, and video face recognition. Dataset details:

VoxCeleb2.torrent

Seeding 2Downloading 0Completed 651Total Downloads 1,474

VoxCeleb2/
- README.md
  1.41 KB
- README.txt
  2.82 KB

This dataset is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at [email protected] for prompt review and removal.