Contact: potapovdanila((AT))mail((DOT))ru
I have successfully defended my PhD thesis on 22 July 2015.
Thesis title: "Supervised Learning Approaches for Automatic Structuring of Videos".
Download: [Manuscript] [Slides]
I did my PhD studies in the
LEAR Team,
Inria Rhone-Alpes, Grenoble, France,
where I was supervised by Matthijs Douze, Zaid Harchaoui, and Cordelia Schmid.
The topic of the thesis is related to various learning tasks for video and audio data.
Before that I was with the Graphics and Media Lab,
CMC faculty of Lomonosov Moscow State University.
Recent talks
Category-specific video summarization
Christmas Colloquium on Computer Vision, Skolkovo
Dec. 28, 2015
Automatic summarization of video data
Spotlight at Khronos-Persyvact Spring School
Mar. 31–Apr. 1, 2015
Abstract of the PhD manuscript
Automatic interpretation and understanding of videos still remains at the
frontier of computer vision. The core challenge is to lift the expressive
power of the current visual features (as well as features from other
modalities, such as audio or text)
to be able to automatically recognize typical video sections, with low temporal
saliency yet high semantic expression. Examples of such long events include
video sections where someone is fishing (TRECVID Multimedia Event Detection),
or where the hero argues with a villain in a Hollywood action movie (Action Movie Franchises).
In this manuscript, we present several contributions towards
this goal, focusing on three video analysis tasks: summarization,
classification, localization.
First, we propose an automatic video summarization method, yielding a short and
highly informative video summary of potentially long videos, tailored for
specified categories of videos. We also introduce a new dataset for evaluation
of video summarization methods, called
MED-Summaries,
which contains complete
importance-scoring annotations of the videos, along with a complete set of
evaluation tools.
Second, we introduce a new dataset, called
Action Movie Franchises,
consisting
of long movies, and annotated with non-exclusive semantic categories (called
beat-categories), whose definition is broad enough to cover most of the movie
footage. Categories such as "pursuit" or "romance" in action movies are
examples of beat-categories. We propose an approach for localizing beat-events
based on classifying shots into beat-categories and learning the temporal
constraints between shots.
Third, we overview the Inria event classification system developed within
the TRECVID Multimedia Event Detection competition and highlight
the contributions made during the work on this thesis from 2011 to 2014.
Projects
Category-specific video summarization (MED-Summaries dataset)
Beat-event detection in action movie franchises (Action Movie Franchises dataset)