A video dataset of unprecedented scale and diversity.

Google has just announced the release of YouTube-8M – a video dataset that comprises of eight million YouTube video IDs and associated labels from a diverse vocabulary of 4800 visual entities.

The dataset is the largest publicly known labelled video dataset, beating Sports-1M’s previously held claim of one million videos and 500 sports-specific classes.


Image: The scale of data contained in YouTube-8M.

But why does the magnitude of YouTube-8M matter? To conduct certain types of video analyses, data is required to facilitate algorithmic detection and classification of video elements. Yet, while there are numerous large-scale image datasets, the same could not previously be said for video.


Image: Number of entities contained in YouTube-8M by topic.


Image: Number of videos contained in YouTube-8M by topic.

It is hoped that the release of YouTube-8M will therefore accelerate new research on video modelling architectures and representation learning, especially approaches that deal effectively with noisy or incomplete labels, transfer learning and domain adaptation.

Using it

YouTube-8M can be downloaded as TensorFlow Record files, and a downloader script that fetches the dataset in shards and stores them in the current directory is also provided.

Alternatively, the dataset can be perused through an in-browser explore feature, which allows you to search the full vocabulary of Knowledge Graph entities, grouped in 24 top-level verticals, along with corresponding videos.


Image: YouTube-8M’s in-browser explore function. This screenshot depicts a subset of dataset videos annotated with the entity “puppy”.

Visit the YouTube-8M webpage here.