HyperAI

The Unsplash Lite Dataset A Simplified Version of the Image Retrieval Dataset

Date

2 years ago

Size

194.59 MB

Organization

Unsplash

Publish URL

unsplash.com

License

其他

Unsplash is one of the largest photography websites in the world, with more than 200,000 photographers from all over the world contributing millions of high-definition and high-quality photographs.

Unsplash opened its image API in 2016, and has reached millions of calls per month so far, as well as a variety of usage scenarios. In August 2020, Unsplash announced the opening of two photo retrieval datasets to provide research materials for more scholars and research institutions.

The Unsplash dataset contains two versions:

Lite version of the dataset (download link is this version):Can be used for commercial and non-commercial use. Contains search information for 25,000 nature-themed Unsplash photos, with a total of 25,000 keywords.

Full version of the dataset:Limited to non-commercial use. Contains search information for 2 million high-quality Unsplash photos, with a total of 5 million keywords.

This dataset is a simplified version of the dataset, with a compressed package of 190M and a decompressed package of 550M. It contains four separate TSV files (Note: TSV files can be loaded in the PostgreSQL database or Python environment):

  • Collections: 82 MB
    • Contains information about photo collections created by Unsplash users: including data such as the photo ID (photo_id), collection ID (collection_id), collection title (collection_title) and timestamp (photo_collected_at);

  • Conversions : 349 MB
    • Contains information about the image selected by the user after the search: including photo timestamp (convert_at), keyword (keyword), photo ID (photo_id), anonymous user ID (onymous_user_id), and user location (conversion_country) data;

  • Keywords : 104 MB
    • Contains data such as the photo ID (photo_id) searched by the user, the searched keyword (keyword), and the confidence value between the keyword and the image (ai_service_1_confidence);

  • Photos : 6.5 MB
    • Contains photo ID (photo_id), url (photo_image_url), photographer information (Photographer_username), camera information (exif_camera), parameters (exif_iso), total number of platform views (stats_views), total number of downloads (stats_downloads), and geographic coordinates of the shooting location (ai_primary_landmark_name)
Unsplash_Lite.torrent
Seeding 1Downloading 0Completed 789Total Downloads 1,533
  • Unsplash_Lite/
    • README.md
      1.26 KB
    • README.txt
      2.52 KB
      • data/
        • unsplash-research-dataset-lite-latest.zip
          194.59 MB