AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

The AVA dataset densely annotates 80 atomic visual actions localized in space and time resulting in 210k action labels with multiple labels per human appearing frequently.

The main differences with existing video datasets are:



If you use this dataset, please cite the following paper (https://arxiv.org/abs/1705.08421):

@article{gu2017,
  title={{AVA}: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions},
  author={Chunhui Gu, Chen Sun, Sudheendra Vijayanarasimhan, Caroline Pantofaru,
David A. Ross, George Toderici, Yeqing Li, Susanna Ricco, Rahul Sukthankar,
Cordelia Schmid, Jitendra Malik},
  journal={arXiv preprint arXiv:1705.08421},
  year={2017}
}
Below are some example frames taken from the dataset.

AVA example images