Dataset Summary | 18 Movie/music Datasets, Covering Movie/song Recommendations, Movie Reviews, Lyrics Recognition, Music Genres...

Whether we are traveling or staying at home to cultivate ourselves, movies and music are always with us in various forms, and have even become a "condiment" to spice up our lives.
Every year during the National Day holiday, cinemas will usher in a wave of movie-watching peaks. According to reports, the total box office of the film market during the National Day holiday last year totaled 2.734 billion, exceeding the same period in 2022 by 83%, and the total number of moviegoers exceeded 65.114 million.
At the same time, statistics from the China Performing Arts Industry Association show that from September 29 to October 6, 2023, there were a total of 44,200 commercial performances (excluding performances in entertainment venues) nationwide, including 121 large-scale concerts and music festivals, with box office revenue of 541 million yuan and 836,600 viewers.
It can be seen that movies and music play a great role in our lives!HyperAI has compiled movie and music related datasets for you, including movie/music recommendations, movie review predictions, lyrics recognition, etc.You can download it on demand to make your holiday more exciting.
Click to view more open source datasets:
Scan the QR code and remark "dataset" to join the discussion group↓

Movie Dataset Summary
1. Movie recommendation dataset
Publishing Platform:Kaggle
Estimated size:8.89 MB
Download address:https://go.hyper.ai/2uTxh
This dataset contains 5,000 movie datasets from TMDB, including the plots, actors, crew, budgets, and revenues of the movies. It is suitable for various application scenarios such as movie recommendation systems and movie market analysis.
Publishing Platform:Kaggle
Release time:2024
Estimated size:199.09 MB
Download address:https://go.hyper.ai/4uTYb
TMDB is a comprehensive movie database that contains a collection of 1 million movies from the TMDB database, providing information about the movies including details such as title, rating, release date, revenue, genre, etc.
3. AclImdb – v1 Large Movie Review Dataset
Publishing Agency:Stanford University
Release time:2011
Estimated size:80.23 MB
Download address:https://go.hyper.ai/CdpFg
AclImdb – v1 Dataset is a large-scale movie review dataset for binary sentiment classification, with 25,000 movie reviews for training, 25,000 for testing, and additional unlabeled data available.
4. Netflix movie review dataset
Publishing Platform:Netflix Prize
Estimated size:665.24 MB
Download address:https://go.hyper.ai/nWG97
The Netflix movie review dataset contains evaluation data from 480,000 users on 17,000 movies, with more than 1 million reviews. The data was collected from October 1998 to November 2005. The ratings are based on a 5-point system, and user information has been anonymized.
5. MovieLens movie recommendation dataset
Publishing Agency:GroupLens Research Team at the University of Minnesota
Release time:2018
Download address:https://go.hyper.ai/RFNqY
This dataset can be used for the research and development of movie recommendation systems. There are multiple versions of the dataset, including but not limited to MovieLens 100K, MovieLens 1M, MovieLens 10M, MovieLens 20M, etc. It is widely used in the research of machine learning, data mining and personalized recommendation systems.
Publishing Agency:Stanford University
Release time:2011
Estimated size:137.77 MB
Download address:https://go.hyper.ai/n247h
This dataset is suitable for binary classification of sentiment and is intended to be used as a benchmark for sentiment classification. It contains 50,000 labeled, polarized movie reviews and 50,000 unlabeled data.
7. Wikipedia Movie Plots Dataset
Publishing Agency:Massachusetts Institute of Technology
Release time:2018
Estimated size:29.55 MB
Download address:https://go.hyper.ai/CnrF2
The Wikipedia Movie Plots dataset contains 34,886 movies from all over the world. Each movie includes the year of release, title, nationality of the movie, director, starring actors, plot introduction, etc. This dataset can be used to handle multiple types of problems such as predicting movie genres and recommending related movies.
8. MovieNet movie understanding dataset
Publishing Agency:The Chinese University of Hong Kong
Release time:2020
Estimated size:263.58 GB
Download address:https://go.hyper.ai/tfoDz
MovieNet is a dataset for movie understanding, containing 1,100 movies with a large amount of multimodal data, such as trailers, photos, plot descriptions, etc. In addition, manual annotations of different aspects are provided in MovieNet.
9. Movie data and ratings dataset
Publishing Platform:Kaggle
Estimated size:227.8 MB
Download address:https://go.hyper.ai/s5DFC
This dataset contains detailed metadata for 45,000 movies in the full MovieLens dataset. This dataset not only covers the basic information of the movies, but also includes detailed information such as release date, language, etc. In addition, it also contains 26 million ratings from 270,000 users, which are rated from 1 to 5 points, providing valuable data for studying the popularity of movies.
Music Dataset Summary
1. Online Music System Information Dataset
Publishing Agency:Information Retrieval Group of the Autonomous University of Madrid
Release time:2011
Estimated size:2.47 MB
Download address:https://go.hyper.ai/Ig3WD
This dataset contains the interaction data between 2,000 users and the Last.fm music platform, including the user's friend relationships, tags, music artists, and the tag information of these artists. It helps researchers study how to use social network data, user tags, and other information to improve recommendation algorithms.
2. OpenMIIR Music Listening EEG Dataset
Publishing Agency:Owen Lab, The University of Western Ontario
Release time:2016
Estimated size:5.88 GB
Download address:https://go.hyper.ai/0qG3t
OpenMIIR is a public domain dataset based on electroencephalogram (EEG) recordings taken during music perception and imagination. It contains EEG data of participants while listening to 12 music clips, as well as the corresponding music stimuli, and is mainly used to analyze the changes in brain waves during music listening.
3. NetEase Cloud Music Sentiment Classification Dataset
Publishing Platform:Huggingface
Estimated size:4.05 MB
Download address:https://go.hyper.ai/OKA4L
The NetEase Cloud Music sentiment classification dataset contains about 395,000 music sentiment label data, each of which consists of three main columns: song ID, playlist ID, and song sentiment label. It is suitable for building sentiment analysis models, performing data mining, and deeply understanding the relationship between music and sentiment.
4. MusicNet music dataset
Publishing Agency:University of Washington
Release time:2017
Estimated size:10.34 GB
Download address:https://go.hyper.ai/ZPuMa
MusicNet is a large music dataset used to supervise and evaluate machine learning methods in music research. The dataset consists of 330 copyright-free classical music recordings and over 1 million annotation labels, which were evaluated and verified by musicians, and the error rate of the labels was only 4%.
5. URMP Music Performance Audiovisual Analysis Dataset
Publishing Agency:Institute of Electrical and Electronics Engineers
Estimated size:11.27 GB
Download address:https://go.hyper.ai/0sjUP
URMP is a dataset for audiovisual analysis of music performances. The dataset consists of 44 simple multi-instrumental musical pieces, composed of individually recorded performances of individual tracks. For each piece, the dataset provides a score in MIDI format, high-quality recordings of individual instruments, and a video of the synthesized piece.
6. CCMUSIC Music Genre Dataset
Publishing Agency:Institute of Automation, Chinese Academy of Sciences
Release time:2017
Estimated size:16.93 GB
Download address:https://go.hyper.ai/mBXI6
The database contains about 1,700 music pieces (in mp3 format) from NetEase Cloud Music. The duration of these music pieces ranges from 270 to 300 seconds and they are divided into 16 genres.
7. Music21 music video dataset
Publishing Agency:Massachusetts Institute of Technology
Release time:2009
Estimated size:42.29 MB
Download address:https://go.hyper.ai/U4qDT
Music21 is an untrimmed video dataset crawled from YouTube by keyword. It contains 21 categories of music performances with high data quality, which can be used to train and evaluate visual sound source separation models.
8. MusicPile Large Music Dataset
Publishing Platform:Huggingface
Release time:2023
Estimated size:6.33 GB
Download address:https://go.hyper.ai/tuVEy
The dataset contains 5.17 million samples and about 4.16 billion tokens. The dataset contains three fields: id, text, and src. Each text has no more than 2,048 tokens. MusicPile covers a wide range of music common sense, knowledge questions and answers, and typical music theory content, which plays a key role in improving the music understanding and creation ability of large models.
9. The best 5,000 album dataset ever
Publishing Platform:Kaggle
Release time:2021
Estimated size:302 KB
Download address:https://go.hyper.ai/SGAHV
This dataset contains the http://rateyourmusic.com The top 5,000 albums as determined by users, including rank, album title, artist name, release date, genre, descriptors, average rating, number of ratings, and number of reviews.
The above is the movie and music dataset compiled by HyperAI. If you have resources that you want to include on the hyper.ai official website, you are also welcome to leave a message or submit a contribution to tell us! Scan the QR code and note "dataset" to join the discussion group↓

About HyperAI
HyperAI (hyper.ai) is the leading artificial intelligence and high-performance computing community in China.We are committed to becoming the infrastructure in the field of data science in China and providing rich and high-quality public resources for domestic developers. So far, we have:
* Provide domestic accelerated download nodes for 1200+ public data sets
* Includes 300+ classic and popular online tutorials
* Interpretation of 100+ AI4Science paper cases
* Support 500+ related terms search
* Hosting the first complete Apache TVM Chinese documentation in China
Visit the official website to start your learning journey: