The Unsplash Lite Dataset A Simplified Version of the Image Retrieval Dataset
Date
Size
Publish URL
License
其他
Unsplash is one of the largest photography websites in the world, with more than 200,000 photographers from all over the world contributing millions of high-definition and high-quality photographs.

Unsplash opened its image API in 2016, and has reached millions of calls per month so far, as well as a variety of usage scenarios. In August 2020, Unsplash announced the opening of two photo retrieval datasets to provide research materials for more scholars and research institutions.
The Unsplash dataset contains two versions:
Lite version of the dataset (download link is this version):Can be used for commercial and non-commercial use. Contains search information for 25,000 nature-themed Unsplash photos, with a total of 25,000 keywords.
Full version of the dataset:Limited to non-commercial use. Contains search information for 2 million high-quality Unsplash photos, with a total of 5 million keywords.
This dataset is a simplified version of the dataset, with a compressed package of 190M and a decompressed package of 550M. It contains four separate TSV files (Note: TSV files can be loaded in the PostgreSQL database or Python environment):
- Collections: 82 MB
- Contains information about photo collections created by Unsplash users: including data such as the photo ID (photo_id), collection ID (collection_id), collection title (collection_title) and timestamp (photo_collected_at);
- Conversions : 349 MB
- Contains information about the image selected by the user after the search: including photo timestamp (convert_at), keyword (keyword), photo ID (photo_id), anonymous user ID (onymous_user_id), and user location (conversion_country) data;
- Keywords : 104 MB
- Contains data such as the photo ID (photo_id) searched by the user, the searched keyword (keyword), and the confidence value between the keyword and the image (ai_service_1_confidence);
- Photos : 6.5 MB
- Contains photo ID (photo_id), url (photo_image_url), photographer information (Photographer_username), camera information (exif_camera), parameters (exif_iso), total number of platform views (stats_views), total number of downloads (stats_downloads), and geographic coordinates of the shooting location (ai_primary_landmark_name)