HyperAI

Dataset Summary | 18 Movie/music Datasets, Covering Movie/song Recommendations, Movie Reviews, Lyrics Recognition, Music Genres...

7 months ago
Information
zhaorui
特色图像

Whether we are traveling or staying at home to cultivate ourselves, movies and music are always with us in various forms, and have even become a "condiment" to spice up our lives.

Every year during the National Day holiday, cinemas will usher in a wave of movie-watching peaks. According to reports, the total box office of the film market during the National Day holiday last year totaled 2.734 billion, exceeding the same period in 2022 by 83%, and the total number of moviegoers exceeded 65.114 million.

At the same time, statistics from the China Performing Arts Industry Association show that from September 29 to October 6, 2023, there were a total of 44,200 commercial performances (excluding performances in entertainment venues) nationwide, including 121 large-scale concerts and music festivals, with box office revenue of 541 million yuan and 836,600 viewers.

It can be seen that movies and music play a great role in our lives!HyperAI has compiled movie and music related datasets for you, including movie/music recommendations, movie review predictions, lyrics recognition, etc.You can download it on demand to make your holiday more exciting.

Click to view more open source datasets:

https://go.hyper.ai/E1jBL

Scan the QR code and remark "dataset" to join the discussion group↓

Movie Dataset Summary

1. Movie recommendation dataset

Publishing Platform:Kaggle

Estimated size:8.89 MB

Download address:https://go.hyper.ai/2uTxh

This dataset contains 5,000 movie datasets from TMDB, including the plots, actors, crew, budgets, and revenues of the movies. It is suitable for various application scenarios such as movie recommendation systems and movie market analysis.

2. TMDB movie data set

Publishing Platform:Kaggle

Release time:2024

Estimated size:199.09 MB

Download address:https://go.hyper.ai/4uTYb

TMDB is a comprehensive movie database that contains a collection of 1 million movies from the TMDB database, providing information about the movies including details such as title, rating, release date, revenue, genre, etc.

3. AclImdb – v1 Large Movie Review Dataset

Publishing Agency:Stanford University

Release time:2011

Estimated size:80.23 MB

Download address:https://go.hyper.ai/CdpFg

AclImdb – v1 Dataset is a large-scale movie review dataset for binary sentiment classification, with 25,000 movie reviews for training, 25,000 for testing, and additional unlabeled data available.

4. Netflix movie review dataset

Publishing Platform:Netflix Prize

Estimated size:665.24 MB

Download address:https://go.hyper.ai/nWG97

The Netflix movie review dataset contains evaluation data from 480,000 users on 17,000 movies, with more than 1 million reviews. The data was collected from October 1998 to November 2005. The ratings are based on a 5-point system, and user information has been anonymized.

5. MovieLens movie recommendation dataset
Publishing Agency:GroupLens Research Team at the University of Minnesota

Release time:2018

Download address:https://go.hyper.ai/RFNqY

This dataset can be used for the research and development of movie recommendation systems. There are multiple versions of the dataset, including but not limited to MovieLens 100K, MovieLens 1M, MovieLens 10M, MovieLens 20M, etc. It is widely used in the research of machine learning, data mining and personalized recommendation systems.

6. IMDB Movie Review Dataset

Publishing Agency:Stanford University

Release time:2011

Estimated size:137.77 MB

Download address:https://go.hyper.ai/n247h

This dataset is suitable for binary classification of sentiment and is intended to be used as a benchmark for sentiment classification. It contains 50,000 labeled, polarized movie reviews and 50,000 unlabeled data.

7. Wikipedia Movie Plots Dataset

Publishing Agency:Massachusetts Institute of Technology

Release time:2018

Estimated size:29.55 MB

Download address:https://go.hyper.ai/CnrF2

The Wikipedia Movie Plots dataset contains 34,886 movies from all over the world. Each movie includes the year of release, title, nationality of the movie, director, starring actors, plot introduction, etc. This dataset can be used to handle multiple types of problems such as predicting movie genres and recommending related movies.

8. MovieNet movie understanding dataset

Publishing Agency:The Chinese University of Hong Kong

Release time:2020

Estimated size:263.58 GB

Download address:https://go.hyper.ai/tfoDz

MovieNet is a dataset for movie understanding, containing 1,100 movies with a large amount of multimodal data, such as trailers, photos, plot descriptions, etc. In addition, manual annotations of different aspects are provided in MovieNet.

9. Movie data and ratings dataset

Publishing Platform:Kaggle

Estimated size:227.8 MB

Download address:https://go.hyper.ai/s5DFC

This dataset contains detailed metadata for 45,000 movies in the full MovieLens dataset. This dataset not only covers the basic information of the movies, but also includes detailed information such as release date, language, etc. In addition, it also contains 26 million ratings from 270,000 users, which are rated from 1 to 5 points, providing valuable data for studying the popularity of movies.

Music Dataset Summary

1. Online Music System Information Dataset

Publishing Agency:Information Retrieval Group of the Autonomous University of Madrid

Release time:2011

Estimated size:2.47 MB

Download address:https://go.hyper.ai/Ig3WD

This dataset contains the interaction data between 2,000 users and the Last.fm music platform, including the user's friend relationships, tags, music artists, and the tag information of these artists. It helps researchers study how to use social network data, user tags, and other information to improve recommendation algorithms.

2. OpenMIIR Music Listening EEG Dataset

Publishing Agency:Owen Lab, The University of Western Ontario

Release time:2016

Estimated size:5.88 GB

Download address:https://go.hyper.ai/0qG3t

OpenMIIR is a public domain dataset based on electroencephalogram (EEG) recordings taken during music perception and imagination. It contains EEG data of participants while listening to 12 music clips, as well as the corresponding music stimuli, and is mainly used to analyze the changes in brain waves during music listening.

3. NetEase Cloud Music Sentiment Classification Dataset
Publishing Platform:Huggingface

Estimated size:4.05 MB

Download address:https://go.hyper.ai/OKA4L

The NetEase Cloud Music sentiment classification dataset contains about 395,000 music sentiment label data, each of which consists of three main columns: song ID, playlist ID, and song sentiment label. It is suitable for building sentiment analysis models, performing data mining, and deeply understanding the relationship between music and sentiment.

4. MusicNet music dataset
Publishing Agency:University of Washington

Release time:2017

Estimated size:10.34 GB

Download address:https://go.hyper.ai/ZPuMa

MusicNet is a large music dataset used to supervise and evaluate machine learning methods in music research. The dataset consists of 330 copyright-free classical music recordings and over 1 million annotation labels, which were evaluated and verified by musicians, and the error rate of the labels was only 4%.

5. URMP Music Performance Audiovisual Analysis Dataset

Publishing Agency:Institute of Electrical and Electronics Engineers

Estimated size:11.27 GB

Download address:https://go.hyper.ai/0sjUP

URMP is a dataset for audiovisual analysis of music performances. The dataset consists of 44 simple multi-instrumental musical pieces, composed of individually recorded performances of individual tracks. For each piece, the dataset provides a score in MIDI format, high-quality recordings of individual instruments, and a video of the synthesized piece.

6. CCMUSIC Music Genre Dataset
Publishing Agency:Institute of Automation, Chinese Academy of Sciences

Release time:2017

Estimated size:16.93 GB

Download address:https://go.hyper.ai/mBXI6

The database contains about 1,700 music pieces (in mp3 format) from NetEase Cloud Music. The duration of these music pieces ranges from 270 to 300 seconds and they are divided into 16 genres.

7. Music21 music video dataset
Publishing Agency:Massachusetts Institute of Technology

Release time:2009

Estimated size:42.29 MB

Download address:https://go.hyper.ai/U4qDT

Music21 is an untrimmed video dataset crawled from YouTube by keyword. It contains 21 categories of music performances with high data quality, which can be used to train and evaluate visual sound source separation models.

8. MusicPile Large Music Dataset
Publishing Platform:Huggingface

Release time:2023

Estimated size:6.33 GB

Download address:https://go.hyper.ai/tuVEy

The dataset contains 5.17 million samples and about 4.16 billion tokens. The dataset contains three fields: id, text, and src. Each text has no more than 2,048 tokens. MusicPile covers a wide range of music common sense, knowledge questions and answers, and typical music theory content, which plays a key role in improving the music understanding and creation ability of large models.

9. The best 5,000 album dataset ever
Publishing Platform:Kaggle

Release time:2021

Estimated size:302 KB

Download address:https://go.hyper.ai/SGAHV

This dataset contains the http://rateyourmusic.com The top 5,000 albums as determined by users, including rank, album title, artist name, release date, genre, descriptors, average rating, number of ratings, and number of reviews.

The above is the movie and music dataset compiled by HyperAI. If you have resources that you want to include on the hyper.ai official website, you are also welcome to leave a message or submit a contribution to tell us! Scan the QR code and note "dataset" to join the discussion group↓

About HyperAI

HyperAI (hyper.ai) is the leading artificial intelligence and high-performance computing community in China.We are committed to becoming the infrastructure in the field of data science in China and providing rich and high-quality public resources for domestic developers. So far, we have:

* Provide domestic accelerated download nodes for 1200+ public data sets

* Includes 300+ classic and popular online tutorials

* Interpretation of 100+ AI4Science paper cases

* Support 500+ related terms search

* Hosting the first complete Apache TVM Chinese documentation in China

Visit the official website to start your learning journey:

https://hyper.ai