The Movies Dataset Movie Information Dataset
Date
Size
Publish URL
License
CC BY 4.0
This dataset contains detailed metadata for 45,000 movies in the full MovieLens dataset, all of which were released before July 2017. This dataset not only covers basic information about the movies, such as posters, backgrounds, budgets, and revenues, but also includes details such as release dates, languages, countries of production, and companies. In addition, it also contains 26 million ratings from 270,000 users, with ratings ranging from 1 to 5, providing valuable data for studying movie popularity.
This dataset was inspired by the author's capstone project in the Springboard Data Science Career Track, which aims to build different types of recommendation systems through exploratory data analysis of movie data. The author's notebooks, including "The Story of Movies" and "Movie Recommender Systems", are provided as kernels along with the dataset, providing practical tools for researchers and developers interested in gaining a deeper understanding of movie data.
Data content:
- movies_metadata.csv:Main movie metadata file. Contains information about the 45,000 movies in the full MovieLens dataset. Information includes poster, background, budget, revenue, release date, language, country of production, and company.
- keywords.csv:Contains movie plot keywords for MovieLens movies. Provided as a stringified JSON object.
- credits.csv:Contains cast and crew information for all movies. Provided as a stringified JSON object.
- links.csv:File containing the TMDB and IMDB IDs for all movies in the Full MovieLens dataset.
- links_small.csv:Contains TMDB and IMDB IDs for a small subset of 9,000 movies from the full dataset.
- ratings_small.csv: A subset of 100,000 ratings from 700 users on 9,000 movies.
With this dataset, researchers can perform a variety of analyses, such as predicting the likelihood of movie revenue and success, or building content-based recommendation engines and collaborative filtering recommendation systems. These analyses not only help understand the dynamics of the film industry, but also provide a scientific basis for the design of movie recommendation systems.