Temporal detection of actions

Abstract

We address the problem of detecting actions, such as drinking or opening a door, in hours of challenging video data. We propose a model based on a sequence of atomic action units, termed ''actoms'', that are characteristic for the action. Our model represents the temporal structure of actions as a sequence of histograms of actom-anchored visual features. Our representation, which can be seen as a temporally structured extension of the bag-of-features, is flexible, sparse and discriminative. We refer to our model as Actom Sequence Model (ASM). Training requires the annotation of actoms for action clips. At test time, actoms are detected automatically, based on a non-parametric model of the distribution of actoms, which also acts as a prior on an action's temporal structure. We present experimental results on two recent benchmarks for temporal action detection. We show that our ASM method outperforms the current state of the art in temporal action detection.

Related publications

The results are published at the CVPR 2011 conference in the paper:
Actom Sequence Models for Efficient Action Detection (poster).

An extended version is available as a research report here.

Data

Our annotations can be download from the data webpage of the LEAR team.

Example results

Below are some examples of actions correctly detected with our ASM approach. The videos are rendered using the HTML5 video tag. To view the sequence of the central frames of the actoms, just put your mouse pointer over the images (uses Javascript).

Detection window

Central frames of actoms

An automatic "action summary" of the whole "Coffee and Cigarettes" test sequence for "Drinking" is available below and here.



© 2008-2012   Adrien GAIDON