ActivityNet Entities Captions Dataset

ActivityNet-Entities adds 158k bounding box annotations to the ActivityNet Captions dataset. Each annotation is a noun phrase. This data can be used to train a video description model. The dataset demonstrates the effectiveness of the model in generating descriptions based on videos, and also shows how to apply image descriptions to the Flickr30k dataset.