PhD student in Computer Vision Graduated!

I am now a Research Scientist at Xerox: see my new webpage at XRCE.

Short bio

I am a former PhD student and Post-Doc of the LEAR team, in the LJK laboratory and INRIA Rhône-Alpes research center. My doctoral school was the Grenoble INP institute of technology. I was funded by the Microsoft Research-INRIA joint center. I mainly work in the fields of computer vision and machine learning, with a keen interest in automatic video content interpretation.

I come from the ENSIMAG engineering school of IT and applied Mathematics. I also have a Master's degree in Machine Learning from the Université Joseph Fourier (UJF) of Grenoble.

I successfully graduated on October 25th, 2012 :-)

PhD thesis

On the 1st of October 2008, I started my PhD thesis under the supervision of the head of the LEAR team, Cordelia Schmid, and Zaid Harchaoui. The title of my PhD subject is:

Structured Models for Action Recognition in Real-world Videos

This PhD thesis (click here for my manuscript and presentation) addresses visual event and action recognition in natural videos, a problem which receives increasing attention due to the widespread availability of personal video capture devices, low cost of multimedia storage, and ease of content sharing. The goal is to develop video representations and recognition techniques that can be applied to a wide range of visual events in real-world multimedia data such as movies and internet videos. In particular, I investigate how an action can be decomposed, what is its discriminative structure, and how to use this information to accurately represent video content.

The main challenge I address lies in how to build models of actions that are simultaneously information-rich --- in order to correctly differentiate between different action categories --- and robust to the large variations in actors, actions, and videos present in real-world data.

Projects

Actom Sequence Models for the temporal detection of actions

The goal of this work is to find when a short human action (like opening a door) is performed in large real-world video databases. We describe an action as a sequence of atomic action units, termed actoms, which are semantically meaningful temporal parts that are characteristic for the action. We learn both the content and the temporal structure of actions. The results are published at the CVPR 2011 conference in the paper Actom Sequence Models for Efficient Action Detection.

More details are available on the project page.

Modeling complex activities with trees of motion components

Although simple actions can be represented as sequences of short temporal parts, longer activities, e.g. pole vaulting, are composed of a variable number of sub-events connected by complex spatio-temporal relations. In this project, we learn how to automatically represent activities as a hierarchy of mid-level motion components. This hierarchy is a data-driven decomposition that is specific to each video. The results are published at the BMVC 2012 conference in the paper Recognizing activities with cluster-trees of tracklets A one page summary is also available here.

Some minimalistic slides and videos are available here, with more coming soon...

Time series kernel for action recognition

In this work, we address the problem of action recognition by describing actions as time series, and introduce a kernel to compare their dynamical aspects. We propose a new kernel specific to videos, which compares the temporal dependencies of actions. Our time series kernel (called DACO, for Difference of Auto-Correlation Operators) is shown to provide complementary information to standard orderless approaches like bag-of-features. The results are published at the BMVC 2011 conference in the paper A time series kernel for action recognition A one page summary is also available here.

More details coming soon...

Mining visual actions from movies

During this project, we aimed at using textual and visual information to extract some video examples of a query action from large movie databases. The paper Mining visual actions from movies published at the BMVC 2009 conference, presents our approach and gives some results on the TV-show Buffy the vampire slayer.
More details are available on the project page.

Challenges

I participated in two challenges during the 2008 summer period:

The TRECVID 2008 High level feature extraction competition.
The goal is to search for high-level semantic features, concepts
such as "Cityscape", "People", "Speech" etc., in a large video database.
Details can be found in LEAR's TRECVID 2008 notebook paper.
The PASCAL VOC challenge 2008 (classification task), which is about recognizing objects (such as "Car", "Bird" or "People") in images.
You will find details on my submissions here.

Master thesis

Between February and June 2008, I did my Master's internship in the LEAR team with Marcin Marszalek and Cordelia Schmid. My work was about combining text and video for action recognition in videos.