Seminars

Contact person: Alberto Bietti, Thomas Lucas
Access information Montbonnot site.

For other interesting seminars nearby see:

2015 | 2014 | 2013 | 2012 | 2011 | 2010 | 2009 | 2008 | 2007 | 2006 | 2005 | 2004

stands for open seminars

stands for team meetings

Recent Advances in Large-Scale Convex Optimization: Algorithms, Complexities, and Applications

Niao He

INRIA Rhône-Alpes, A104

Monday, January 19, 12:00

Abstract:

In the modern era of large-scale machine learning and high-dimensional statistics, using mixing regularization and kernelization become increasingly popular and important modeling strategies. However, they often lead to very complex optimization models with extremely large scale and nonsmooth objective functions, which bring new challenges to the traditional first-order methods, due to the expensive computation or memory cost of proximity operators and even gradients. In this talk, I will discuss some recent algorithmic advances that cope with these challenges by taking advantage of the underlying structures and using randomization techniques. I will present (i) my work on the composite mirror prox algorithm for a broad class of variational inequalities, allowing to cover the composite minimization problem with multiple nonsmooth regularization tems, (ii) my work on the doubly stochastic gradient descent algorithm for stochastic optimization problems over reproducing kernel Hilbert spaces. These algorithms exhibit the optimal convergence rates and make it practical to handle problems with extremely large dimensions and large datasets. Besides the theoretical efficiency, the algorithms are also proven useful in a wide range of interesting applications in machine learning, image processing, and statistical inferences.

Many computer vision problems have an asymmetric distribution of information, i.e. less or more information about a problem is available at training time than at test time. In my talk I will discuss our recent work on both situations: 1) the LUPI framework for the case when we have additional data modalities available for the training data, and 2) a label propagation approach for the case when an additional similarity measure is available at test time (both published at ICCV 2013).

After decades of use of multispectral remote sensing, most of the major space agencies now have new programs to launch hyperspectral sensors, recording the reflectance information of each point on the ground in hundreds of narrow and contiguous spectral bands. The spectral information is instrumental for the accurate analysis of the physical component present in one scene. But, every rose has its thorns: most of the traditional signal and image processing algorithms fail when confronted to such high dimensional data (each pixel is represented by a vector with several hundereds of dimensions).

In this talk, we focus on the extension to hyperspectral data of a very powerful image processing analysis tool: the Binary Partition Tree (BPT). It provides a generic hierarchical representation of images and consists of the two following steps:

construction of the tree: one starts from the pixel level and merge pixels/regions progressively until the top of the hierarchy (the whole image is considered as one single region) is reached. To proceed, one needs to define a model to represent the regions (for instance: the average spectrum—but this is not a good idea) and one also needs to define a similarity measure between neighbouring regions to decide which ones should be merged first (for instance the euclidean distance between the model of each region—but this is not a good idea either). This step (construction of the tree) is very much related to the data.
the second step is the pruning of the tree: this is very much related to the considered application. The pruning of the tree leads to one segmentation. The resulting segmentation might not be any of the result obtained during the iterative construction of the tree. This is where this representation outperforms the standard approaches. But one may also perform classification, or objet detection (assuming an object of interest will appear somewhere as one noode of the tree, the game is to define a suitable criterion, related to the application, to find this node).

Results are presented on various hyperspectral images.

Large-scale learning from interaction data

Miro Dudík

INRIA Rhone-Alpes, F107

Thursday, January 17 2013, 10:30

Abstract:

In many important applications, we need to make decisions in environments where the reward is only partially observed, but can be modeled as a function of an action and an observed context. Examples include user content optimization, Internet advertising and health-care policy. In the first part of the talk, I will discuss the problem of evaluation of a new policy (e.g., a user serving policy) given historic data. The key statistical challenge is properly accounting for the fact that the past policy and the proposed policy differ. I will present an accurate technique that solves this without collecting any new data. In the second part of the talk, I will focus on a computational challenge of learning from massive interaction data sets. I will describe a distributed optimization technique that allows solving tera-scale problems in 1 hour (using 1000 machines/cores).

Based on joint work with John Langford, Lihong Li, Alekh Agarwal and Olivier Chapelle.

Seminars in 2012

Towards efficient video representations for action recognition

Heng Wang

INRIA Rhone-Alpes, A104

Friday, November 30 2012, 12:00

Abstract:

In this talk, we first review some popular spatial-temporal features for video, and compare their performance in action recognition. In total, we consider four different feature detectors and six local feature descriptors. We demonstrate that dense sampling at regular positions consistently outperforms all tested space-time interest point detectors in real-world videos.

The second part will introduce our recent video features based on dense trajectories and motion boundary descriptors. Dense trajectories capture the local motion patterns in the video and guarantee a good coverage of the context information. Additionally, motion boundary descriptors show to consistently outperform other state-of-the-art descriptors, in particular on real-world videos that contain a significant amount of camera motion. We will also discuss some drawbacks of the current methods and possible further extensions.

The Role of V4 During Natural Vision

Julien Mairal

INRIA Rhone-Alpes, F107

Monday, November 26 2012, 11:00

Abstract:

The functional organization of area V4 in the mammalian ventral visual pathway is far from being well understood. V4 is believed to play an important role in the recognition of shapes and objects and in visual attention, but its complexity makes it hard to analyze. Individual cells in V4 have been shown to exhibit a large diversity of preferences to visual stimuli characteristics, including orientation, curvature, motion, color and texture. Such observations were for a large part obtained from electrophysiological and imaging studies, when a subject (monkey or human) is shown a sequence of artificial stimuli during data acquisition. In our study, we intend to go beyond such an approach and analyze a population of V4 neurons in naturalistic conditions. More precisely, we record responses from V4 neurons to grayscale still natural images---that is, discarding color and motion content. We propose a new computational model for V4 that does not rely on any pre-defined image features but only on invariance and sparse coding principles. Our approach is the first to achieve comparable prediction performance for V4 as for V1 cells on responses to natural images. Our model is also interpretable using sparse principal component analysis. In the neuron population observed and based on our computational model, we discover as our main finding two groups of neurons: those selective to texture versus those selective to contours. This supports the thesis that one primary role of V4 is to extract objects from background in the visual field. Moreover, our study also confirms the diversity of V4 neurons. Among those selective to contours, some of them are selective to orientation, others to acute curvature features.

This is a joint work with Yuval Benjamini, Ben Willmore, Michael Oliver, Jack Gallant and Bin Yu. This work was performed at UC Berkeley.

Refresher on neural networks and overview of libraries for deep learning

Adrien Gaidon

LEAR-XRCE reading group

INRIA Rhone-Alpes, A104

Friday, November 23 2012, 11:30

Abstract:

Recent results [1] highlighted the excellent performance of deep learning architectures for complex high-level computer vision tasks. This talk aims at providing some basic practical knowledge in order to start playing around with these algorithms.

We will begin with a brief refresher on neural networks and the back-propagation algorithm. We will then provide an overview of two Open Source libraries that can be used to learn deep architectures: Theano [2] (python) and EBLearn [3] (C++).

Reading material:
Chapter 11 (on Neural Networks) from The Elements of Statistical Learning

Presentation: http://lear.inrialpes.fr/people/gaidon/lear_xrce_deep_learning_01.html

Block-Coordinate Frank-Wolfe for Structural SVMs

Martin Jaggi

INRIA Rhone-Alpes, F107

Monday, November 12 2012, 14:00

Abstract:

We propose a randomized block-coordinate variant of the classic Frank-Wolfe algorithm for convex optimization with block-separable constraints. Despite its lower iteration cost, we show that it achieves the same convergence rate as the full Frank-Wolfe algorithm. We also show that, when applied to the dual struc- tural support vector machine (SVM) objective, this algorithm has the same low iteration complexity as primal stochastic subgradient methods. However, unlike stochastic subgradient methods, the stochastic Frank-Wolfe algorithm allows us to compute the optimal step-size and yields a computable duality gap guarantee. Our experiments indicate that this simple algorithm outperforms competing structural SVM solvers.

Using Machine Learning to Predict Protein-Protein and Protein-Ligand Interactions

Sergei Grudinin

INRIA Rhone-Alpes, F107

Friday, November 9 2012, 10:30

Abstract:

Protein-protein and protein-ligand interactions are crucial for many biological processes such as signal transduction, DNA replication, etc. Such interactions are also fundamental in many diseases (e.g. cancers). In this talk, I will describe our recent work on machine learning techniques that predict these interactions.

Due to the difficulties, time and cost of the experimental methods for determining the structures and binding affinities of molecular complexes, efficient computational methods are usually used in this field. However, the accuracy of these computational methods is often rather low due to the crude approximations of the interactions within the complex and also due to insufficient sampling of the configurational space for the molecules that form the complex.

I will describe a new machine learning algorithm that very precisely reconstructs the interactions between the molecules based on the structural information currently available in the databases. These databases contain three-dimensional molecular structures determined by experimental techniques and have been growing very rapidly. In 2012, the PDB (Protein Data Bank) contained about 80,000 of protein structures. The CSD (Cambridge Structural Database), a database for small molecules, contained about 500,000 entries at the beginning of 2012. We trained our interaction model with some 60,000 parameters on structures from these databases and verified the results on several standard benchmarks as well as in blind docking prediction competitions. The success rates of our model, according to the benchmarks, rank it among the top-3 methods currently available.

Predicting Binary Features for Attribute-Based and Multi-Label Classification

Christoph Lampert

INRIA Rhone-Alpes, Grand Amphi

Friday, October 26 2012, 15:30

Abstract:

The prediction of attributes, i.e. semantic properties of objects or scenes, has recently received a lot of attention in the computer vision community. In their simplest form, one can interpret attributes simply as a layer of binary mid-level features that can be computed from the image contents. In my talk I will discuss two recent works in this area: the automatic learning of additional, non-semantic, binary features that augment an existing set of attributes (ECCV 2012), and a method for more efficiently predicting binary outputs in highly connected graphical models, where inference has to performed by sampling (NIPS 2012).

Multi-step flow fusion: towards accurate and dense correspondences in long video shots

Patrick Pérez

INRIA Rhone-Alpes, F107

Thursday, October 25 2012, 10:00

A Few Machine Learning-Friendly Optimization and Algorithmic Properties

Pierre Machart

INRIA Rhone-Alpes, F107

Thursday, July 4 2012, 15:00

Abstract:

I will introduce some of the main results of my PhD. First, the so-called "proximal" methods have drawn a lot of attention, lately, for solving non-smooth optimization problems that naturally arise for Machine Learning and Signal Processing, among others. The efficiency of those methods relies on the computation of the proximity operator, which, in a lot of problems, can't be obtained in closed form. In those situations, the proximity operator is approximated through the use of iterative procedures. We will see how some finite-time analysis can lead to unexpected strategies where the precision of the approximations can be chosen so that the global procedure has: a) good theoretical properties of the quality of the solution, b) a minimal computational cost.

Then, we will investigate the use of a non-standard performance measure of interest for (multi-class) machine learning problems, namely the Confusion Matrix. We advocate that in several cases, this quantity could be "minimized", instead of the more standard "risk" that is usually considered in ML problems. Along with this a framework, we provide some of its theoretical grounds with generalization bounds, that can be obtained through a generalization of the "stability" analysis, which consists in leveraging algorithmic properties to provide statistical guarantees of the classifiers."

Hypothesis Testing and Bayesian Inference: New Applications of Kernel Methods

Arthur Gretton

INRIA Rhone-Alpes, Grand Amphi

Monday, June 11 2012, 11:00

Abstract:

In the early days of kernel machines research, the "kernel trick" was considered a useful way of constructing nonlinear learning algorithms from linear ones, by applying the linear algorithms to feature space mappings of the original data. Recently, it has become clear that a potentially more far reaching use of kernels is as a linear way of dealing with higher order statistics, by mapping probabilities to a suitable reproducing kernel Hilbert space (i.e., the feature space is an RKHS).

I will describe how probabilities can be mapped to reproducing kernel Hilbert spaces, and how to compute distances between these mappings. A measure of strength of dependence between two random variables follows naturally from this distance. Applications that make use of kernel probability embeddings include:

* Nonparametric two-sample testing and independence testing in complex (high dimensional) domains. As an application, we find whether text in English is translated from the French, as opposed to being random extracts on the same topic.

* Bayesian inference, in which the prior and likelihood are represented as feature space mappings, and a posterior feature space mapping is obtained. In this case, Bayesian inference can be undertaken even in the absence of a model, by learning the prior and likelihood mappings from samples.

Helping each other to see: Humans and machines

Larry Zitnick

INRIA Rhone-Alpes, Grand Amphi

Tuesday, April 24 2012, 11:00

Abstract:

Humans and machines see the world differently, each having their own strengths and weaknesses. In this talk, I describe two projects exploring how they may help each other.

Visual object recognition by machines is notoriously difficult. To help in the learning process, humans are typically used to gather large hand-labeled training datasets from which the machines may learn. However, humans may also be used to "debug" the machine's recognition pipeline to learn what aspects are lacking. Specifically, we explore the various stages of part-based person detectors. We perform human studies in which subjects perform the same sub-tasks as their machine counterparts, and accuracies are compared.

Current state-of-the-art models of human actions in realistic videos, e.g. the bag of spatio-temporal visual words, are often based on the aggregation of local features in an orderless fashion. However, actions are by essence temporal phenomena and some actions, like "sitting down" and "getting up", can only be reliably classified if their models incorporate some temporal structure. We present two recent results on incorporating temporal information in state-of-the-art recognition methods. First, we describe a simple action model, called the Actom Sequence Model (ASM), encoding global ordering constraints between temporal parts. We explain how we learn the temporal structure of an action and perform efficient action detection on large video databases. Then, we introduce a new kernel between multivariate time series, called the Difference between Auto-Correlation Operators (DACO) kernel, and demonstrate its applicability to videos. This kernel compares two actions based on their dynamics, represented by the auto-correlation operator in the Reproducing Kernel Hilbert Space (RKHS) associated with a "base" kernel between frames. We show that it leverages useful temporal dependency information, that complements traditional kernels on bag-of-words. Finally, we illustrate the performance of our algorithms on challenging action recognition benchmarks and show improvements w.r.t. the state of the art. Joint work with Zaid Harchaoui and Cordelia Schmid

We propose a new approach to the problem of robust estimation for some inverse problems arising in multiview geometry. Inspired by recent advances in the statistical theory of recovering sparse vectors, we define our estimator as a Bayesian maximum a posteriori with multivariate Laplace prior on the vector describing the outliers. This leads to an estimator in which the fidelity to the data is measured by the $L_\infty$- norm while the regularization is done by the L1-norm. The proposed procedure is fairly fast since the outlier removal is done by solving one linear program (LP). An important difference compared to existing algorithms is that for our estimator it is not necessary to specify neither the number nor the proportion of the outliers; only an upper bound on the maximal measurement error for the inliers should be specified. We present theoretical results assessing the accuracy of our procedure, as well as numerical examples illustrating its efficiency on synthetic and real data. This is a joint work with Renaud Keriven.

In this talk, I will present our two recent approaches to human action recognition in uncontrolled videos. The first approach deals with the case where there are not enough training sequences to learn the action classifiers directly from videos. In this case, we show how we can make use of the images collected from the Web to learn representations of actions and use this knowledge to automatically annotate actions in videos. Our approach is unsupervised, in the sense that it requires no human intervention other than the text querying. The benefits are two-fold: first, we show that we can improve retrieval of action images, and second, we can collect a large generic database of action poses, which can then be used in tagging videos. We present experimental evidence that using action images collected from the Web, annotating actions is possible. In the second part of the talk, I will present our approach which uses the scene and object information in the videos together with the pose and motion information to infer human actions. Here, our observation is that human actions can be identified not only by the singular observation of the human body in motion, but also properties of the surrounding scene and the related objects. We propose an approach that integrates multiple feature channels from several entities and formulate the problem in a multiple instance learning (MIL) framework. Our experimental results show that scene and object information can be effectively used to complement person features for human action recognition.

Faster Algorithms for Max-Product Message-Passing

Tiberio Caetano

NICTA/Australian National University

INRIA Rhone-Alpes, F107

Thursday, October 14th 2010, 16h00

Abstract:

Maximum A Posteriori inference in graphical models is often solved via message-passing algorithms, such as the junction-tree algorithm, or loopy belief-propagation. The exact solution to this problem is well known to be exponential in the size of the model's maximal cliques after it is triangulated, while approximate inference is typically exponential in the size of the model's factors. In this presentation, I'll show recent work from our lab in which we take advantage of the fact that many models have maximal cliques that are larger than their constituent factors, and also of the fact that many factors consist entirely of latent variables (i.e., they do not depend on an observation). This is a common case for several practical models, including many models on grids, trees, ring-structured models and skip-chain models. In such cases, we are able to decrease the exponent of complexity for message-passing for both exact and approximate inference. We illustrate the practical advantages of the improved algorithm in a number of tasks, such as protein design, text and image denoising, optical flow inference, stereo disparity estimation, and graph matching.
Joint work with Julian McAuley. [Paper]

Set Based Modeling Of Objects And Their Context

Gokberk Cinbis

LEAR

INRIA Rhone-Alpes, F107

Friday, October 8th 2010, 14h00

Abstract:

In computer vision, many image entities can be represented as sets of high-dimensional items. For example, an object in an image can be represented as a set of image patches, where each image patch has a feature vector encoding the local appearance. Training classification models directly on sets of unordered items, where each set can have varying cardinality, can be difficult. In this talk, I will introduce a new boosting-based supervised learning algorithm, called SetBoost, for building set classifiers. In the second part of the talk, I will give details about our novel contextual object detection model that uses SetBoost. In natural images, objects tend to appear in certain arrangements with respect to the other objects (object context) and the scene (scene context). The aim of our proposed model is to improve localization and recognition accuracy of object detection algorithms using object context and scene context. Our approach outperforms existing state-of-the-art methods in challenging object detection benchmark datasets.

Scene and object recognition with lots of categories

Antonio Torralba

CSAIL, MIT

INRIA Rhone-Alpes, Grand Amphithéâtre

I will present two different transformations that can be applied to images before further processing. The first transformation is called DAISY, and was originally developed for wide baseline dense reconstruction. DAISY computes dense local descriptors in an efficient way, then we use a simple graph-cut techniques to match the images based on these descriptors. The second transformation was developed for fast object detection and reduces the image to local dominant orientations. This yields a compact but discriminative binary representation, which can be parsed using SSE instructions to detect objects in real-time.

Learning Distinguishing Marks for Image Classification

Zaïd Harchaoui

Robotics Institute, Carnegie Mellon University

INRIA Rhone-Alpes, F107

Monday, January 4th 2010, 16h

Abstract:

We tackle here the problem of multi-class image classification from few training examples, where only small parts of the image help discriminating between classes. Such problems arise when classifiying images of objects/persons in the wild. In such settings, standard kernel-based classifiers perform well only when combined with strong prior knowledge and efficient discriminative part detectors. We propose here a convex sparsity-enforced kernel-based methods for this task, introducing a pool-L1 penalty which automatically singles out discriminant "distinguishing marks" to leverage classification performance. We report experimental results on a horses in the wild dataset and on several benchmarks datasets.

Query formulation and efficient navigation through data to reach relevant results are undoubtedly major challenges for image or video retrieval. Queries of good quality are typically not available and the search process needs to rely on relevance feedback given by the user, which makes the search process iterative and laborious. A key question then is: Is it possible to replace or complement scarce explicit feedback with implicit feedback (IF)? IF can be inferred from various sensors not specifically designed for the retrieval task. In this talk, I will present preliminary results on inferring the relevance of images based on IF about users' attention, measured using an eye tracking device. We have shown that, in reasonably controlled setups at least, already fairly simple features and classifiers are capable of detecting the relevance based on eye movements alone, without using any explicit feedback. This work is one of the outcomes of PinView, a EU FP7 collaborative project. It was done in collaboration with A Klami, C Saunders and S. Kaski.

Probabilistic Models of Textual Collections for Information Access

Eric Gaussier

Université Joseph Fourier

INRIA Rhone-Alpes, F107

Wednesday, January 14th 2009, 10h

Abstract:

Several probabilistic models of text collections have recently been introduced in the text processing community. These models are often defined from a statistical learning perspective. Over the years, however, several empirical findings on how words behave in documents have been reported (from the work of G. Zipf in 1949 to more recent studies). In this presentation, we study the links between probabilistic models of text collections and empirical observations concerning word frequency distributions. In the first part, we will introduce formal characterizations of several empirical observations. We will then review retrieval heuristics and propose an analytical characterization of them which can be used to design IR (Information Retrieval) models.We will then review standard probabilistic models in light of our characterizations and finally introduce new models (based on the beta negative binomial and log-logistic distributions) compatible with empirical observations. We will finally illustrate the behavior of our models on standard text collections.

Inria only: ppt slides

Abstract:

I will present two approaches for hybrid text-image information processing that can be straightforwardly generalized to more general multimodal scenarios. Both approaches fall in the trans-media pseudo-relevance feedback category. The first method proposes to use a mixture model of the aggregate components, considering them as a single relevance concept. The second approach, to determine trans-media similarities between a new multimedia document and the objects of some repository, define these similarities as an aggregation of mono-modal similarities between the elements of the aggregate and the new multimodal object. I further show how one can frame a large variety of problems in order to address them with the proposed techniques: image annotation or captioning, text illustration and multimedia retrieval and clustering. As an example scenario, the travel blog assistant system is used to illustrate some of the experimental results.

Towards a Theory of Cascaded Detectors

Jim Rehg

INRIA Rhone-Alpes, F107

Wednesday, May 7th 2008, 11h30

Slides

Inria only: ppt slides

Abstract:

Cascades of boosted ensembles have become popular in the object detection community following their introduction in the face detector of Viola and Jones. Since then, researchers have sought to improve upon the original approach by exploring alternative boosting methods, feature sets, etc. Nevertheless, key decisions about the most basic aspects of the original cascade classifier, such as how many hypotheses to include in an ensemble and the appropriate balance of detection and false positive rates in the individual stages, have not been studied systematically. Choices which have a significant effect on the cascade's performance are usually made with heuristics or through trial and error.

We propose a novel method for training cascade classifiers, which exploits the shape of the ROC curve for a cascade in ways that have been previously overlooked. We present a new mathematical characterization of the space of possible cascade operating points. The results of our approach are cascade detectors with significantly-improved testing speeds in comparison to other automatic training methods. We automatically produce cascades whose detection speeds match those of the best hand-tuned detectors.

Improving fast nearest neighbour search in large database
for visual recognition.

Kristian Mikolajczyk

University of Surrey

INRIA Rhone-Alpes, F107

Tuesday April 8rd 2008, 16h00

Abstract:

Local feature detectors and descriptors of local image structures are used in many state of-the-art vision methods that require local image-to-image correspondences. In this talk I will discuss an approach for linear discriminant projection of high dimensional image descriptors to reduce the number of dimensions and to improve their matching performance. The method is based on Fischer Analysis and global statistics which can be estimated from a real or simulated training data. The projected descriptors are more discriminative than the original ones, 3-4 times more memory efficient, and require only a small computational overhead. I will show experimental results in the context of fast search for visual correspondence using different tree data structures and approximate nearest neighbour search. Finally, a recognition system based on a vocabulary forest of local features will be presented. The system is capable of simultaneous categorization and localization of scenes, objects and actions.The talk will consist of two parts. It will start with a broad overview of text mining, its main goals, tasks, and problems. Several common tasks will be described in some detail, including building and preprocessing of text collections, text categorization, extraction of terms, entities and relations, and document summarization. Known well-performing techniques for solving these problems will be briefly discussed. In the second part, several complete information extraction and text mining systems will be presented in more detail, their strengths and shortcomings demonstrated and contrasted.

The aim of this paper is to address recognition of natural human actions in diverse and realistic video settings. This challenging but important subject has mostly been ignored in the past due to several problems one of which is the lack of realistic and annotated video datasets. Our first contribution is to address this limitation and to investigate the use of movie scripts for automatic annotation of human actions in videos. We evaluate alternative methods for action retrieval from scripts and show benefits of a text-based classifier. Using the retrieved action samples for visual learning, we next turn to the problem of action classification in video. We present a new method for video classification that builds upon and extends several recent ideas including local space-time features, space-time pyramids and multi-channel non-linear SVMs. The method is shown to improve state-of-the-art results on the standard KTH action dataset by achieving 91.8\% accuracy. Given the inherent problem of noisy labels in automatic annotation, we particularly investigate and show high tolerance of our method to annotation errors in the training set. We finally apply the method to the learning and classification of challenging action classes in movies and show promising results.

This talk will address the detection of object and action classes in unconstrained scenes. We first consider object class recognition and localisation in still images. Building upon recent advances in the field we show how histogram-based descriptors combined with the boosting classifier provide a state of the art object detector. Among improvements we introduce Fisher weak learner for multi-valued histogram features and address the training from limited sets of examples. We also address computational aspects and analyse the tradeoff between the speed and the accuracy of the detector. Validation of the method on VOC05 and VOC06 benchmarks for object recognition shows its superior performance. In particular, the approach outperforms all the methods reported in VOC05 Challenge for 7 out of 8 detection tasks while using a single set of parameters and providing close to real-time performance.

We next consider recognition and localisation of "atomic" actions in video. We treat such actions similarly to the objects in images and extend the boosted histogram detector to action detection in space-time. Using this approach, we address recognition and localisation of human actions in realistic scenarios with substantial variation in subject appearance, motion, surrounding scenes, viewing angles and spatio-temporal extents. In contrast to the previous works that study action recognition in controlled settings, here we train and test the algorithms on real movies. We in particular investigate the combination of shape and motion information for action understanding. To this end we introduce ``keyframe priming'' that combines discriminative models of human appearance and motion in action. Keyframe priming is shown to significantly improve the performance of action detection. We present detection results for the action class ``drinking'' evaluated on two episodes of the movie ``Coffee and Cigarettes'' with 36,000 frames in total.

Penalized least squares with nonquadratic penalties

I will present recent research on using nearest-neighbor vector quantization for estimating intrinsic dimensionality of high-dimensional datasets and for learning informative partitions of labeled data.
In the first part of the talk, I will discuss a technique for intrinsic dimensionality estimation based on the theoretical notion of quantization dimension. This technique works by quantizing the dataset at increasing rates (in practice, we use k-means to learn the quantizer) and by fitting a parametric form to the plot of the empirical quantizer distortion as a function of rate. By using tree-structured quantization, we can simultaneously estimate dimensionality and partition the dataset into subsets having different intrinsic dimensions.
In the second part of the talk, I will discuss an information-theoretic method for learning a nearest-neighbor quantizer from labeled continuous data such that the index of the nearest prototype of a given data point approximates a sufficient statistic for its class label. I will demonstrate applications of this method to learning discriminative visual vocabularies for bag-of-features image classification and to image segmentation.

Seminars in 2006

Inverse chronological order.

Eric nowak	15 december 2006, 16h30
Lear Project, INRIA Rhone-Alpes	C207, INRIA Rhône-Alpes

Human character recognition in TV-style movies

Alexander Klaeser	6 december 2006, 16h00
Lear Project, INRIA Rhone-Alpes	C207, INRIA Rhône-Alpes

Sensor Synchronization and Localization for Meeting Scene Analysis

David Demirdjian	17 october 2006, 16h
MIT Artificial Intelligence Laboratory	F107, INRIA Rhône-Alpes

Presentation of an appearance model for small targets tracking

Julien Bohn¿	11 october 2006, 17h
Lear Project, INRIA Rhone-Alpes	C207, INRIA Rhône-Alpes

Contribution au mosa¿quage d'images a¿riennes

Christophe Simler	25 september 2006, 14h00
Universit¿ de Haute-Alsace, composante Label	C208, INRIA Rhône-Alpes

Efficient MAP approximation for dense energy functions

Matial Hebert	18 july 2006, 14h30
The Robotics Institute, Carnegie Mellon University	F107, INRIA Rhône-Alpes

Blind Vision

Shai Avidan	17 july 2006, 17h00
Mitsubishi Electric Research Laboratories	F107, INRIA Rhône-Alpes

Latent Mixture Vocabularies for Object Categorization

Diane Larlus	12 july 2006, 14h00
LEAR Group	C207, INRIA Rhône-Alpes

statistical models to address the problem of object recognition

Thomas Deselaers	4 july 2006, 14h00
Computer Science Department, Aachen University	grand Amphi, INRIA Rhône-Alpes

Conservative Learning and On-line Boosting for Vision

Horst Bischof	5 june 2006, 14h00
Institute for Computer Graphics and Vision, TU Graz	grand Amphi, INRIA Rhône-Alpes

Multiple Object Class Detection with a Generative Model

Bernt Schiele	9 june 2006, 14h30
Department of Computer Science, Darmstadt University of Technology	F107, INRIA Rhône-Alpes

Extremely randomized trees applied to image quantification combined to a visual attention process for object categorization

Frank Moosmann	15 may 2006, 16h
Lear Project, INRIA Rhône-Alpes	C207, INRIA Rhône-Alpes

Brain Computer Interfaces

Vincent Guigue	14 april 2006, 11h
Lab. LITIS - INSA de Rouen	F107, INRIA Rhône-Alpes

Methodes de filtrage pour du suivi dans des sequences d'images - application au suivi de points caracteristiques

Elise Arnaud	4 april 2006, 16h30
Universit¿ de Genes, Italy et IRISA Rennes	F107, INRIA Rhône-Alpes

Error-resilient source codes and joint source/channel codes

Herve Jegou	3 april 2006, 16h
IRISA/University of Rennes	A104, INRIA Rhône-Alpes

Object Detection in Crowded Scenes

Bastian Leibe	20 march 2006, 11h00
Multimodal Interactive Systems group, Darmstadt	F107, INRIA Rhône-Alpes

Beyond bag-of-words: recent research developments on visual categorization at XRCE

Florent Perronnin and Gabriela Csurka	16 march 2006, 16h00
Xerox Research Centre Europe, Image Processing Group	C207, INRIA Rhône-Alpes

Modelling Scenes with Local Descriptors and Latent Aspects

Tinne Tuytelaars	16 february 2006, 15h00
K.U.Leuven, VISICS Group	F107, INRIA Rhône-Alpes

Le programme TRECVID : Exp¿rimentations en recherche par le contenu dans des bases de documents vid¿os

Georges Quenot	9 february 2006, 14h00
CLIPS-IMAG	F107, INRIA Rhône-Alpes

Geometric Context from a Single Image

Derek Hoiem	6 february 2006, 15h00
Robotics Institute of Carnegie Mellon University	F107, INRIA Rhône-Alpes

Computer vision using local binary patterns

Matti Pietik¿inen	12 january 2006, 14h30
Information Processing Laboratory, University of Oulu, Finland	F107, INRIA Rhône-Alpes

Evaluation de d¿tecteurs et de descripteurs de points d'int¿ret sur des images infrarouges

Julien Bohn¿	11 january 2006, 16h
Lear Project, INRIA Rhone-Alpes	C207, INRIA Rhône-Alpes

older seminars : go to 2005

Details of 2006 seminars

Learning a similarity measure to compare never seen objects

Presenter: Eric nowak

15 December, at 16h30

C207, INRIA Rhône-Alpes

Affiliation: Lear Project, INRIA Rhone-Alpes

Abstract:
We propose a similarity measure between two images that predicts how similar two images of never seen objects are, given a training set of similar and different object pairs. This similarity measure is used for visual identification from *one image*. It does not model any a priori deformation nor does it expect a linear or quadratic transformation of the input space to be relevant, instead it clusters local image representations and weights these clusters for the same/different prediction. An ensemble of extremely randomized decision trees is used as clusterer. These trees are particularly adapted to the clustering since they are very fast to learn and they produce redundant information, which brings robustness. We evaluate our similarity measure on three datasets and outperform state-of-the-art competitive methods.

Human character recognition in TV-style movies

Presenter: Alexander Klaeser

6 December, at 16h00

C207, INRIA Rhône-Alpes

Affiliation: Lear Project, INRIA Rhone-Alpes

Abstract:
This master thesis describes a supervised approach to the detection and the identification of humans in TV-style video sequences. In still images and video sequences, humans appear in different poses and views, fully visible and partly occluded, with varying distances to the camera, at different places, under different illumination conditions, etc. This diversity in appearance makes the task of human detection and identification to a particularly challenging problem. A possible solution of this problem is interesting for a wide range of applications such as video surveillance and content-based image and video processing. In order to detect humans in views ranging from full to close-up view and in the presence of clutter and occlusion, they are modeled by an assembly of several upper body parts. For each body part, a detector is trained based on a Support Vector Machine and on densely sampled, SIFT-like feature points in a detection window. For a more robust human detection, localized body parts are assembled using a learned model for geometric relations based on Gaussians. For a flexible human identification, the outward appearance of humans is captured and learned using the Bag-of-Features approach and non-linear Support Vector Machines. Probabilistic votes for each body part are combined to improve classification results. The combined votes yield an identification accuracy of about 80% in our experiments on episodes of the TV series ?Buffy the Vampire Slayer?. The Bag-of-Features approach has been used in previous work mainly for object classification tasks. Our results show that this approach can also be applied to the identification of humans in video sequences. Despite the difficulty of the given problem, the overall results are good and encourage future work in this direction.

Sensor Synchronization and Localization for Meeting Scene Analysis

Presenter: David Demirdjian

17 October, at 16h00

F107, INRIA Rhône-Alpes

Affiliation: MIT Artificial Intelligence Laboratory

Abstract:
In this talk we tackle the problems of automatically i) synchronizing audio-visual streams and ii) localizing a set of cameras in a meeting analysis setting. More exactly, we consider a conference meeting setup where each participant wears a close-talking microphone and is recorded by a personal video camera. The multiple audio and video streams are recorded in an unsynchronized manner and the location and orientation of the cameras are unknown. We propose here some techniques for automatically estimating the time discrepancy between all audio and video streams and recovering the location and orientation of the cameras. First we show how the mutual information between the estimated motion energy of the lips and the audio energy can be used to recover the time discrepancy between the video and audio streams corresponding to the same participant. Then we show how the same technique can be used to synchronize the audio-visual streams corresponding to different participants. Finally we describe a probabilistic Bayesian framework for estimating the location and orientation of a set of cameras. We show how the head direction of the users can be used as a constraint by exploiting gaze patterns in multiparty conversational settings. In order to evaluate the performance of our algorithms, we show some synchronization and calibration results on real meetings.

Presentation of an appearance model for small targets tracking

Presenter: Julien Bohn¿

11 October, at 17h00

C207, INRIA Rhône-Alpes

Affiliation: Lear Project, INRIA Rhone-Alpes

Abstract:
Our method combines a statistical appearance model of the target and an accurate modeling of the background in the neighborhood. The 2 models are updated during the image sequence to adapt appearance changes. We especially take care of the ability of the algorithm to provide a good estimation of the confidence in the position estimations

Contribution au mosa¿quage d'images a¿riennes

Presenter: Christophe Simler

25 September, at 14h00

C208, INRIA Rhône-Alpes

Affiliation: Universit¿ de Haute-Alsace, composante Label

Abstract:
Cet expos¿ intitul¿ ¿ Contribution au mosa¿quage d'images a¿riennes ¿, pr¿sente les travaux d'une th¿se. Nous d¿crivons notre dispositif exp¿rimental, ainsi que les caract¿ristiques des s¿quences d'images qui en ¿manent. Nous faisons ensuite un ¿tat de l'art des techniques de mosa¿quage, ainsi qu'une ¿tude approfondie des algorithmes. Dans la derni¿re partie nous parlons de nos contributions, qui sont l'¿laboration d'un vecteur descripteur invariant aux rotations selon l'axe optique pour la mise en correspondance de points sp¿cifiques, l'impl¿mentation d'une technique de recalage subpixellique des correspondances et l'¿laboration d'une m¿thode de compensation de l'accumulation d'erreurs d'une mosa¿que.

Efficient MAP approximation for dense energy functions

Presenter: Martial Herbert

18 July, at 14h30

F107, INRIA Rhône-Alpes

Affiliation: The Robotics Institute, Carnegie Mellon University

Abstract:
We present an efficient method for maximizing energy functions with first and second order potentials, suitable for MAP labeling estimation problems that arise in undirected graphical models. Our approach is to relax the integer constraints on the solution in two steps. First we efficiently obtain the relaxed global optimum following a procedure similar to the iterative power method for finding the largest eigenvector of a matrix. Next, we map the relaxed optimum on a simplex and show that the new energy obtained has a certain optimal bound. Starting from this energy we follow an efficient coordinate ascent procedure that is guaranteed to increase the energy at every step and converge to a solution that obeys the initial integral constraints. We also present a sufficient condition for ascent procedures that guarantees the increase in energy at every step.

Blind Vision

Presenter: Shai Avidan

17 July, at 17h00

F107, INRIA Rhône-Alpes

Affiliation: Mitsubishi Electric Research Laboratories

Abstract:
We have developed a general framework for secure image and video analysis that allows a client to have his data analyzed by a server, privately. For example, the client might submit his images to the server for face detection, without letting the server learn anything about the content of the images. Or, more generally, the client might use a query image to query an image database stored on the server, without revealing the content of the query image to the server. In the last year, we have implemented a secure face detector as a proof-of-concept, presented our work at a scientific conference and extended the method to work with different types of machine learning technologies.

Latent Mixture Vocabularies for Object Categorization

Presenter: Diane Larlus

12 July, at 14h00

C207, INRIA Rhône-Alpes

Affiliation: LEAR Group

Abstract:
The visual vocabulary is an intermediate level representation which has been proven to be very powerful for addressing object categorization problems. It is generally built by vector quantizing a set of local image descriptors, independently of the object model used for categorizing images. We propose here to embed the visual vocabulary creation within the object model construction, allowing to make it more suited for object class discrimination. We experimentally show that the proposed model outperforms approaches not learning such an adapted visual vocabulary.

statistical models to address the problem of object recognition

Presenter: Thomas Deselaers

4 July, at 14h00

Grand Amphi, INRIA Rhône-Alpes

Affiliation: Computer Science Department, Aachen University

Abstract:
Object Recognition in images, that is deciding whether an object is contained in an image or not and to tell where it is located is an active field of research. A promising approach to this problem is to model objects as a collection of parts where relationships can be modeled flexibly.

We present a set of methods following this approach where image patches extracted from certain points in the images are used as features.

Starting from approaches inspired by nearest neighbor classification we develop various statistical models to address the problem of object recognition. Though most of the models developed are strongly connected, the training method and the representation of the data have a strong impact on the performance of a system. Some of the methods offer interesting insights in the way computers might be able to learn the visual appearance of certain object categories. For example, an object recognition system trained to recognize faces learns that the most discriminative, i.e. the most relevant part, are the eyes.
Using the methods presented, very interesting and promising results for different tasks can be achieved.

Conservative Learning and On-line Boosting for Vision

Presenter: Horst Bischof

5 June, at 14h00

Grand Amphi, INRIA Rhône-Alpes

Affiliation: Institute for Computer Graphics and Vision, TU Graz

Abstract:
I will present two recently developed visual learning methods:

1. The conservative learning framework allows to learn object detectors with minimal or no supervision by exploiting the redundancy of the video stream of cameras. Conservative learning exploits generative and discriminative learning in a co-training fashion to obtain powerful object detectors. We demonstrate the framework on a surveillance task where we learn person and car detectors in an on-line fashion.

2. One method in the on-line conservative learning framework is a novel on-line Adaboost feature selection algorithm. Together with efficiently computable features (Haar Wavelets, Integral Orientation Histograms, etc.) training the classifier on-line and incrementally as new data arrives has several advantages and opens new application areas for boosting in computer vision. We will demonstrate on-line learning of detection, background modeling and tracking tasks based on on-line boosting, all algorithms are real-time capable. All approaches benefit significantly from the on-line training.

Multiple Object Class Detection with a Generative Model

Presenter: Bernt Schiele

9 June, at 14h30

F 107, INRIA Rhône-Alpes

Affiliation: Department of Computer Science Darmstadt University of Technology

Abstract:
In this talk we propose an approach capable of simultaneous recognition and localization of multiple object classes using a generative model. A novel hierarchical representation allows to represent individual images as well as various objects classes in a single, scale and rotation invariant model. The recognition method is based on a codebook representation where appearance clusters built from edge based features are shared among several object classes. A probabilistic model allows for reliable detection of various objects in the same image. The approach is highly effi- cient due to fast clustering and matching methods capable of dealing with millions of high dimensional features. The system shows excellent performance on several object categories over a wide range of scales, in-plane rotations, background clutter, and partial occlusions. The performance of the proposed multi-object class detection approach is comparable with state of the art approaches dedicated to a single object class recognition problem.

Extremely randomized trees applied to image quantification combined to a visual attention process for object categorization

Presenter: Frank Moosmann

15 may, at 16h00

C 207, INRIA Rhône-Alpes

Affiliation: Lear Project

Abstract:
Lately, the bag-of-features approach became very popular for Image Categorization. However, there are several areas where it can be improved: The selection of features is so far done either densely or with detector functions. While the dense approach achieves better results than detector-based approaches, it also has a higher complexity. The second area of possible improvement is the creation of visual codebooks. The standard clustering method - k-means - is not only slow, it also does not create codebooks suited to discriminate between classes. The associated nearest-neighbor routine to assign clusters is also slow.
We proposed to improve in both areas: Extremely-Randomized Trees are used to create a codebook efficiently and in a discriminative manner. Beside, a combined bottom-up/top-down process is introduced to bias the random selection of features, which leads to a smaller amount of features needed to obtain the same and even better results.

Brain Computer Interfaces

Presenter: Vincent Guigue

14 april, at 11h00

F 107, INRIA Rhône-Alpes

Affiliation: Lab. LITIS - INSA de Rouen

Abstract:
A lot of research have been carried out to design Brain Computer Interfaces (BCI), especially in the field of supervised classification of non stationary signals.
EEG signals require particular processing and we propose to tackle those problems according to three approaches: building a denoised compact representation for raw signals, introducing translation invariance in the procedure and dealing with the variability of EEG signals.
In all our approaches we keep two threads: non-parametric tools with kernel machines and a tripolar strategy including the representation of raw signals, the building of similarities between representations and the classification machine.

First, we face the problem of describing the raw signals.
We aim at constructing a denoised and compact representation of the raw signals.
We designed the Kernel Basis Pursuit (KBP) algorithm which combines multiple kernels, sparse regularization and very efficient solving of regression problems.
We add some heuristics to make this method parameter-free thus enabling us to deal with large amounts of data.

Then we make the assumption that one difficulty resides in the variable time position of the discriminant patterns.
We develop a translation invariant approach to classify non-stationary signals.
Such a method relies on a graph model of shift-covariant representation (wavelet transform or time-frequency) where all the time information becomes comparative.

Finally, the variability of EEG signals turned out to be the main difficulty in BCI problems.
We show that combining multiple classifiers and variable selection is an efficient strategy to identify evoked potential in EEG.

* Key words: Regularization L1, Kernel methods, Multiple kernel, Graph kernel, Translation invariance, Multiple classifiers, Brain Computer Interface.

Methodes de filtrage pour du suivi dans des sequences d'images - application au suivi de points caracteristiques

Presenter: Elise Arnaud

4 april, at 16h30

F 107, INRIA Rhône-Alpes

Affiliation: Universit¿ de Genes, Italy et IRISA Rennes

Abstract:
Cette ¿tude traite de l'utilisation de m¿thodes de filtrage (filtrage de Kalman, methodes sequentielles de Monte Carlo) pour du suivi dans des s¿quences d'images. Ces algorithmes reposent sur une repr¿sentation du syst¿me dynamique par une cha¿ne de Markov cach¿e, d¿crite par une loi dynamique et une vraisemblance des donn¿es. Pour construire une m¿thode g¿n¿rale, une loi dynamique estim¿e sur les images est consid¿r¿e. Ce choix met en ¿vidence les limitations du mod¿le simple de cha¿ne de Markov cach¿e, qui ne d¿crit pas la d¿pendance des ¿l¿ments du syst¿me aux images. Nous proposons d'abord une mod¿lisation originale du probl¿me. Celle-ci rend les images explicites et permet de construire des algorithmes sans information a priori. Les filtres associ¿s ¿ cette nouvelle repr¿sentation sont d¿riv¿s sur la base des filtres classiques, en consid¿rant un conditionnement par rapport ¿ la s¿quence. Il est ¿galement pr¿sent¿ comment ce nouveau sch¿ma permet de consid¿rer des mod¿les simples, pour lesquels la fonction d'importance optimale est disponible.

Ensuite, nous nous int¿ressons ¿ la validation pratique de la mod¿lisation propos¿e sur une application de suivi de points caract¿ristiques. Les syst¿mes mis en oeuvre sont enti¿rement estim¿s sur la s¿quence. Ils associent des mesures de similarit¿ ¿ une dynamique d¿finie ¿ partir d'un mouvement instantan¿ estim¿ par une m¿thode diff¿rentielle robuste. Les algorithmes construits sont valid¿s sur de nombreuses s¿quences r¿elles, et utilises pour differentes applications (imagerie medicale, reconnaissance d'objet).

Error-resilient source codes and joint source/channel codes

Presenter: Herve Jegou

3 april, at 16h

A 104, INRIA Rhône-Alpes

Affiliation: IRISA, Rennes

Abstract:
L'expos¿ se d¿roulera en deux parties distinctes.

En premier lieu, deux contributions sur le codage conjoint source-canal seront pr¿sent¿es. La premi¿re concerne le d¿codage de codes ¿ longueur variable. Une technique d'agr¿gation du treillis de d¿codage optimal sera expos¿e. Elle permet de diminuer la complexit¿ du d¿codage bay¿sien d'un ordre de grandeur. Son optimalit¿ pour les r¿alisations typiquement conjointe source/canal est motiv¿e par le calcul de la quantit¿ d'information contenue dans la contrainte de terminaison. La seconde contribution consiste en l'introduction de codes fond¿s sur des r¿gles de r¿-¿criture et implant¿s par des transducteurs s¿quentiels. Quelques propri¿t¿s illustreront l'int¿r¿t de cette classe de codes.

La seconde partie de cet expos¿ traitera de la recherche par similarit¿ et plus particuli¿rement de la recherche approximative de plus proche voisins dans des espaces de grande dimension. Apr¿s une introduction de la probl¿matique, nous soulignerons les limitations d'un algorithme de l'¿tat de l'art, Omedrank, avant de poursuivre sur des am¿liorations cet algorithme. Nous montrerons en particulier qu'il est possible d'obtenir d'importants gains en modifiant la strat¿gie de vote utilis¿e. Nous donnerons enfin quelques perspectives de recherche sur ce th¿me.

Object Detection in Crowded Scenes

Presenter: Bastian Leibe

20 march, at 11h00

F 107, INRIA Rhône-Alpes

Affiliation: Multimodal Interactive Systems group, Darmstadt

Abstract:
The detection of object classes in real-world images is a challenging problem which is further complicated by the effects of overlaps and partial occlusions. We present a novel algorithm which addresses this problem by considering object categorization and top-down segmentation as two interleaved processes that closely collaborate towards a common goal. As we will show, the close coupling between those two processes allows our method to accumulate additional evidence about object hypotheses and resolve ambiguities caused by overlaps and partial visibility.

The core part of our approach is a flexible formulation for object shape that can combine the information observed on different training examples in a probabilistic extension of the Generalized Hough Transform. The resulting approach can detect categorical objects in novel images and automatically infer a top-down segmentation from the recognition result. The segmentation is then used to again improve recognition by allowing the system to focus on object pixels and discard misleading influences from the background. Moreover, the information from where in the image a hypothesis draws its support is used in an MDL based verification stage to resolve ambiguities between overlapping hypotheses and factor out the effects of partial occlusion.

As an application, we address the problem of detecting objects such as cars, motorbikes, and pedestrians in real-world street scenes. Qualitative and quantitative results on several challenging data set confirm that our method is able to reliably detect objects in crowded scenes, even when they overlap and partially occlude each other. In addition, the flexible nature of our approach allows it to operate on very small training sets.

Beyond bag-of-words: recent research developments on visual categorization at XRCE

Presenters: Florent Perronnin and Gabriela Csurka

16 march, at 16h00

C 207, INRIA Rhône-Alpes

Affiliation: Xerox Research Centre Europe, Image Processing Group

Abstract:
Generic Visual Categorization (GVC) is the pattern classification problem which consists in assigning one or multiple labels to an image based on its semantic content. Several state-of-the-art GVC systems were inspired by the bag-of-words (BOW) approach to text-categorization. In the BOW representation, a text document is encoded as a histogram of the number of occurrences of each word. Similarly, one can characterize an image by a histogram of "visual words" count. This is sometimes referred to as the bag-of-keypatches or bag-of-visterms. During this talk, we will discuss recent developments at the Xerox Research Centre Europe (XRCE) to improve on such representations.

We first present a novel and practical approach to GVC based on a universal vocabulary, which describes the content of all the considered classes of images, and class vocabularies obtained through the adaptation of the universal vocabulary using class-specific data. An image is characterized by a set of histograms - one per class - where each histogram describes whether the image content is best modeled by the universal vocabulary or the corresponding class vocabulary. It is shown experimentally on three very different databases that this novel representation outperforms those approaches which characterize an image with a single histogram.

In the second part we improve the categorizer by incorporating geometrical information. Based on scale, orientation or closeness of the keypatches we can consider a large number of simple geometrical relationships, each of which can be considered as a simplistic classifier. We select from this multitude of classifiers (several millions in our case) and combine them effectively with the original classifier. An improvement is demonstrated on a challenging 10 class dataset.

Modelling Scenes with Local Descriptors and Latent Aspects

Presenter: Tinne Tuytelaars

16 february, at 15h00

F 107, INRIA Rhône-Alpes

Affiliation: K.U.Leuven, VISICS Group

Abstract:
A new approach to model visual scenes in image collections is presented, based on local invariant features and probabilistic latent space models. We provide answers to the following three open questions: 1) whether the invariant local features are suited for scene (rather than object) classification; 2)whether unsupervised latent space models can be used for feature extraction in the classification task; and 3) whether the latent space formulation can discover visual co-occurrence patterns, motivating novel approaches to image organization and segmentation. Using a 9500 images-dataset, our approach is validated on each of these issues. First, we show with extensive experiments on binary and multiclass scene classification tasks, that the bag-of-words representation derived from local invariant descriptors, consistently outperforms state-of-the-art approaches. Second, we show that Probabilistic Latent Semantic Analysis (PLSA) generates a compact scene representation, discriminative for accurate classification, and significantly more robust when less training data are available. Third, we have exploited the ability of PLSA to automatically extract visually meaningful aspects, to propose new algorithms for aspect-based image ranking and context-sensitive image segmentation.

Additionally, I'll discuss some planned future work, exploiting a similar scheme based on latent aspects and local invariant features for the integration of visual and textual data.

Le programme TRECVID : Exp¿rimentations en recherche par le contenu dans des bases de documents vid¿os

Presenter: Georges Qu¿not

9 february, at 14h00

F 107, INRIA Rhône-Alpes

Affiliation: CLIPS-IMAG

Abstract:
Le National Institute of Standard and Technology am¿ricain (NIST) et DARPA ont lanc¿ une campagne d'¿valuation annuelle des syst¿mes de recherche par le contenu dans des bases de documents vid¿os (TRECVID). Les syt¿mes sont ¿valu¿s globalement dans le cadre d'une t¿che de recherche aussi r¿aliste que possible. Des composants ou techniques n¿cessaires pour ces syst¿mes sont ¿valu¿s ind¿pendamment comem la segmentation en plans, la segmentation en histoires, la d¿tection de concepts et la d¿tection du mouvement de la cam¿ra. Nous d¿crirons les principes g¿n¿raux de la campagne, les diff¿rentes t¿ches et les r¿sultats obtenus, repr¿sentatifs de l'¿tat de l'art dans le domaine. Nous pr¿senterons ¿galement les diff¿rents travaux conduits dans l'¿quipe MRIM et ¿valu¿s dans le cadre de TRECVID.

Geometric Context from a Single Image

Presenter: Derek Hoiem

6 february, at 15h00

F 107, INRIA Rhône-Alpes

Affiliation: Robotics Institute of Carnegie Mellon University

Abstract:
Humans have an amazing ability to instantly grasp the overall 3D structure of a scene -- ground orientation, relative positions of major landmarks, etc -- even from a single image. This ability is completely missing in most popular recognition algorithms, which pretend that the world is flat and/or view it through a patch-sized peephole. Yet it seems very likely that having a grasp of this "geometric context" of a scene should be of great assistance for many tasks, including recognition, navigation, and novel view synthesis. In this talk, I will describe our first steps toward the goal of estimating a 3D scene context from a single image. We propose to estimate the coarse geometric properties of a scene by learning appearance-based models of /geometric/ classes. Geometric classes describe the 3D orientation of an image region with respect to the camera. We provide a multiple-hypothesis segmentation framework for robustly estimating scene structure from a single image and obtaining confidences for each geometric label. These confidences can then (hopefully) be used to improve the performance of many other applications. We provide a quantitative evaluation of our algorithm on a dataset of challenging outdoor images.
We also demonstrate its usefulness in two applications:
1) improving object detection, and
2) automatic single-view reconstruction ("Automatic Photo Pop-up", SIGGRAPH'05).
Joint work with Alexei Efros and Martial Hebert at CMU.

Computer vision using local binary patterns

Presenter: Matti Pietik¿inen

12 january, at 14h30

F 107, INRIA Rhône-Alpes

Affiliation: Information Processing Laboratory, University of Oulu, Finland

Abstract:
The local binary pattern (LBP) operator is defined as a gray-scale invariant texture measure, derived from a general definition of texture in a local neighborhood. Through its recent extensions, the LBP operator has been made into a really powerful measure of image texture, showing excellent results in many empirical studies. The LBP operator can be seen as a unifying approach to the traditionally divergent statistical and structural models of texture analysis. Perhaps the most important property of the LBP operator in real-world applications is its invariance against monotonic gray level changes. Another equally important is its computational simplicity, which makes it possible to analyze images in challenging real-time settings. The LBP method has already been used in a large number of applications all over the world. This talk presents an overview of the LBP approach, emphasizing our recent research results. Theoretical foundations of the LBP and examples of applying it to various computer vision problems are presented, including classification of 3D textured surfaces, face recognition, face detection, facial expression recognition, content-based retrieval, modeling the background and detecting moving objects, and recognition of dynamic textures.

Evaluation de d¿tecteurs et de descripteurs de points d'int¿ret sur des images infrarouges

Presenter: Julien Bohn¿

11 january, at 16h00

C 207, INRIA Rhône-Alpes

Affiliation: Lear, INRIA Rhône-Alpes

Abstract:
Une ¿valuation de diff¿rents d¿tecteurs et descripteurs de points d'int¿r¿t appliqu¿s ¿ des images infra-rouges basse r¿solution sera pr¿sent¿e. Apr¿s une rapide pr¿sentation de la m¿thode de test, les r¿sultats des diff¿rents algorithmes seront comment¿s afin de souligner les avantages et inconv¿nients de chaque technique.

Seminars in 2005

Inverse chronological order.

Discriminative Regions for Semi-Supervised Object Class Localization

Caroline Pantofaru	7 December, 2005 at 16h00
Vision and Mobile Robotics Lab , Carnegie Mellon University	C207, INRIA Rhône-Alpes

Discovering objects and their location in images

Andrew Zisserman	5 December, 2005 at 16h00
Department of Engineering Science, University of Oxford	Grand Amphi, INRIA Rhône-Alpes

Hyperfeatures - Multilevel Local Coding for Visual Recognition

Ankur Agarwal	23 November, 2005 at 16h00
Lear Project, INRIA Rhone-Alpes	C207, INRIA Rhône-Alpes

Manifold Learning and Image Segmentation

Jakob Verbeek	24 August, 2005 at 16h00
Intelligent Autonomous Systems, University of Amsterdam	C207, INRIA Rhône-Alpes

Dynamic Scene Analysis using Non-Parametric Statistics

Yoni Wexler	30 June, 2005 at 16h00
Weizmann Institute, Israel	F107, INRIA Rhône-Alpes

Infra-red image classification

Eric Nowak	14 June, 2005 at 14h00
Lear Project, INRIA Rhône-Alpes	C207, INRIA Rhône-Alpes

Object Detection with Line Segment Networks

Vittorio Ferrari	30 May, 2005 at 14h00
Weizmann Institute, Israel	F107, INRIA Rhône-Alpes

Creating Efficient Codebooks for Visual Recognition

Frédéric Jurie	27 April, 2005 at 11h00
INRIA Rhône-Alpes, Project LEAR	C207, INRIA Rhône-Alpes

Feature Detection in Color Images

Joost van de Weijer	13 Avril, 2005 at 16h00
Lear, INRIA Rhône-Alpes	C207, INRIA Rhône-Alpes

Semi-Local Parts and Adjacency Relations for Object Recognition

Svetlana Lazebnik	21 Feb, 2005 at 1600hrs
Beckman Institute (University of Illinois at Urbana-Champaign)	F 107, INRIA Rhône-Alpes

High Dimensional Discriminant Analysis

Charles Bouveyron	09 February, 2005 at 16h00
INRIA Rhône-Alpes - Project LEAR	C 207, INRIA Rhône-Alpes

Strike a Pose: Tracking People by Finding Stylized Poses

Deva Ramanan	04 February, 2005 at 1400hrs
University of Berkeley, Computer Vision Group	C207, INRIA Rhône-Alpes

Fast Image Retrieval using SIFT descriptors

Micha¿l Sdika	21 January, 2005 at 1400hrs
INRIA Rhône-Alpes, Project LEAR	C207, INRIA Rhône-Alpes

Monocular Human Motion Capture with a Mixture of Regressors

Ankur Agarwal	05 January, 2005 at 1600hrs
INRIA Rhône-Alpes, Project LEAR	C207, INRIA Rhône-Alpes

Previous seminars: go to 2004

Details

Discriminative Regions for Semi-Supervised Object Class Localization

Presenter: Caroline Pantofaru

7 December, 2005 at 16h00

C207, INRIA Rhône-Alpes

Affiliation: Vision and Mobile Robotics Lab , Carnegie Mellon University

Abstract:
I will present a method for object class localization using image regions. Image regions are extracted using unsupervised image segmentation, and provide a natural spatial support for detection results. Each region can be classified using both its texture content, as well as local interest points in and around it. Our framework allows selection of the most discriminative features for a given object class in a semi-supervised manner, where image labels are given but not the pixelwise delineation of training objects. Despite the semi-supervised training, this method allows pixelwise localization where the actual object mask is determined, not simply a bounding box or object centre.

Discovering objects and their location in images

Presenter: Andrew Zisserman

5 December, 2005 at 16h00

Grand Amphi, INRIA Rhône-Alpes

Affiliation: Department of Engineering Science, University of Oxford

Abstract:
This is joint work with Josef Sivic, Bryan Russell, Alexei Efros, and William Freeman.
There has been much recent research activity in recognizing object categories (such as cars, faces, motorbikes) in images. Most approaches start by learning a category model from a set of labelled training images for each category. The level of supervision of these training images can vary from segmenting in detail each object instance, through to simply labelling the image as containing that object category.
In this work we explore unsupervised training - we seek to discover the object categories depicted in a set of unlabelled images. We achieve this using a model developed in the statistical text literature: probabilistic Latent Semantic Analysis (pLSA). In text analysis this is used to discover topics in a corpus using the bag-of-words document representation. Here we treat object categories as topics, so that an image containing instances of several categories is modeled as a mixture of topics.
The model is applied to images by using a visual analogue of a word, formed by vector quantizing SIFT-like region descriptors. The topic discovery approach successfully translates to the visual domain: for a small set of objects, we show that both the object categories and their approximate spatial layout are found without supervision. Performance of this unsupervised method is compared to previous supervised approaches, and we show applications to category based retrieval in image databases and films.

Hyperfeatures - Multilevel Local Coding for Visual Recognition

Presenter: Ankur Agarwal

23 November, 2005 at 16h00

C207, INRIA Rhône-Alpes

Affiliation: Lear Project, INRIA Rhone-Alpes

Abstract:
Histograms of local appearance descriptors are a popular representation for visual recognition. They are highly discriminant and they have good resistance to local occlusions and to geometric and photometric variations, but they are not able to exploit spatial co-occurrence statistics of features at scales larger than their local input patches. We present a new multilevel visual representation, `hyperfeatures', that is designed to remedy this. The basis of the work is the familiar notion that to detect object parts, in practice it often suffices to detect co-occurrences of more local object fragments ??? a process that can be formalized as comparison (vector quantization) of image patches against a codebook of known fragments, followed by local aggregation of the resulting codebook membership vectors to detect co-occurrences. This process converts collections of local image descriptor vectors into slightly less local histogram vectors ??? higher-level but spatially coarser descriptors. Our central observation is that it can therefore be iterated, and that doing so captures and codes ever larger assemblies of object parts and increasingly abstract or `semantic' image properties. This repeated nonlinear `folding' is essentially different from that of hierarchical models such as Convolutional Neural Networks and HMAX, being based on repeated comparison to local prototypes and accumulation of co-occurrence statistics rather than on repeated convolution and rectification. We formulate the hyperfeatures model and study its performance under several different image coding methods including clustering based Vector Quantization, Gaussian Mixtures, and combinations of these with Latent Discriminant Analysis. We find that the resulting high-level features provide improved performance in several object image and texture image classification tasks. Reference: Technical Report RR-5655, INRIA - Aug. 2005

Presenter: Ankur Agarwal

05 January, 2005 at 1600hrs

C 207, INRIA Rhône-Alpes

Affiliation: INRIA Rhone-Alpes, LeaR

Abstract:
We address 3D human motion capture from monocular images, taking a learning based approach to construct a probabilistic pose estimation model from a set of labelled human silhouettes. To compensate for ambiguities in the pose reconstruction problem, our model explicitly calculates several possible pose hypotheses. It uses locality on a manifold in the input space and connectivity in the output space to identify regions of multi-valuedness in the mapping from silhouette to 3D pose. This information is used to fit a mixture of regressors on the input manifold, giving us a global model capable of predicting the possible poses with corresponding probabilities. These are then used in a dynamical-model based tracker that automatically detects tracking failures and re-initializes in a probabilistically correct manner. The system is trained on optical sensor based motion capture data, using the corresponding real human silhouettes supplemented with silhouettes synthesized artificially from several different models for improved robustness to inter-person variations. Static pose estimation is illustrated on a variety of silhouettes. The robustness of the method is demonstrated by tracking on a real image sequence requiring multiple automatic re-initializations.

Seminars in 2004

Titles

Color Constancy from local invariant regions

Tijmen Moerland	25 November, 2004 at 1600hrs
INRIA Rhône-Alpes, Project LEAR	C207, INRIA Rhône-Alpes

Summary of Summer school on Machine Learning

Ankur Agarwal	04 November, 2004 at 1600hrs
INRIA Rhône-Alpes, Project LEAR	C207, INRIA Rhône-Alpes

Summary of International Workshop on Object Recognition

Frédéric Jurie	28 October, 2004 at 1600hrs
INRIA Rhône-Alpes, Project LEAR	C207, INRIA Rhône-Alpes

Detecting Keypoints with Stable Position, Orientation and Scale under Illumination Changes

Bill Triggs	17 June, 2004 at 1700hrs
INRIA Rhône-Alpes, Project LEAR	C207, INRIA Rhône-Alpes

Title Unkown

Michel Dhome	28 April, 2004 at 14h30
LASMEA, Universit¿ Blaise Pascal	F 107, INRIA Rhône-Alpes

Learning 3D Human Pose from Silhouettes

Ankur Agarwal	24 March, 2004 at 1530hrs
INRIA Rhône-Alpes, Project LEAR	C207, INRIA Rhône-Alpes

Bandelettes et repr¿sentation g¿om¿trique des images

Erwan Le Pennec	03 March, 2004 at 11h00
CMAP, Ecole Polytechnique	F 107, INRIA Rhône-Alpes

Reading of: New Algorithms for Efficient High-Dimensional Nonparameteric Classification

Salil Jain and Peter Carbonetto	19 February, 2004 at 1600hrs
INRIA Rhône-Alpes, Project LEAR	C207, INRIA Rhône-Alpes

Kernel fisher discriminant for texture segmentation

Jianguo Zhang	05 February, 2004 at 1700hrs
INRIA Rhône-Alpes	C 207, INRIA Rhône-Alpes

Improving KD Trees. L-infinity distance for Triangulation

Richard Hartley	21 January, 2004 at 1600hrs
The Australian National University	Grand Amphi, INRIA Rhône-Alpes

Human detection based on a probabilistic assembly of robust part detectors

Krystian Mikolajczyk	15 January, 2004 at 1600hrs
Robotics Research Group, University of Oxford	F 107, INRIA Rhône-Alpes

Human detection based on a probabilistic assembly of robust part detectors

Presenter: Krystian Mikolajczyk

15 January, 2004 at 1600hrs

F 107, INRIA Rhône-Alpes

Affiliation:
Robotics Research Group, University of Oxford
Abstract:
I will present a novel method for human detection which can detect pedestrians as well as close-up views of humans in the presence of clutter and occlusion. Humans are modeled as flexible assemblies of parts. The key point of the approach is a robust part detection. The part detectors are based on gradient and Laplacian based local features which efficiently capture the shape information. Using the probabilistic co-occurrence of these features increases their distinctiveness while the robustness remains the same. Learning with AdaBoost combines features with the highest co-occurrence probabilities.
Furthermore, the parts include a larger local context than in previous part-based work [Forsyth'97,Ronfard02] and they are therefore more distinctive. They are also not global (cf. previous work on pedestrian detectors [Papageorgiou'00]) and they therefore allow for occlusion and the detection of close-up views. The detection results are further improved by computing a probabilistic score for the assembly of parts which takes into account their relative position. The approach is also very efficient as (i) all part detectors use the same initial features, (ii) a coarse-to-fine cascade approach is used for part detection, (iii) an assembly strategy reduces the number of spurious detections and the search space. The results are very promising and outperform existing human detectors.

Seminars in 2003

Titles

Transductive Learning for Scene Classification

Bill Triggs	18 December, 2003 at 1700hrs
INRIA Rhône-Alpes - Project LEAR	C 208, INRIA Rhône-Alpes

Indices de forme invariants ¿ l'¿chelle pour la reconnaissance de cat¿gories d'objets

Frédéric Jurie	04 December, 2003 at 1600hrs
INRIA Rhône-Alpes - Project LEAR	C 208, INRIA Rhône-Alpes

Unsupervised Statistical Models for General Object Recognition

Peter Carbonetto	27 November, 2003 at 1530hrs
INRIA Rhône-Alpes - Project LEAR	C 207, INRIA Rhône-Alpes

Apprentissage Direct de la Matrice Jacobienne Inverse d'une Fonction

Frédéric Jurie	06 November, 2003 at 1600hrs
INRIA Rhône-Alpes - Project LEAR	F 107, INRIA Rhône-Alpes

Texture Recognition Using Affine-Invariant Regions

Svetlana Lazebnik	23 October, 2003 at 1600hrs
Beckman Institute (University of Illinois at Urbana-Champaign)	F 107, INRIA Rhône-Alpes

Méthodes de réduction de dimensionnalité pour le dépliage du ruban cortical

Charles Bouveyron	01 October, 2003 at 1600hrs
INRIA Rhône-Alpes - Project LEAR	C 207, INRIA Rhône-Alpes

Learning Dyanamical Models for Tracking Complex Motion

Ankur Agarwal	18 September, 2003 at 1600hrs
INRIA Rhône-Alpes - Project LEAR	C 207, INRIA Rhône-Alpes

The Trade-off Between Generative and Discriminative Classifiers

Guillaume Bouchard	04 September, 2003 at 1600hrs
INRIA Rhône-Alpes - Project LEAR	C 207, INRIA Rhône-Alpes

Abstracts

Transductive Learning for Scene Classification

Presenter: Bill Triggs

18 December, 2003 at 1700hrs

C 208, INRIA Rhône-Alpes

Affiliation: INRIA Rhone-Alpes - Project LEAR

Indices de forme invariants ¿ l'¿chelle pour la reconnaissance de cat¿gories d'objets

Presenter: Fr¿d¿ric Jurie

04 December, 2003 at 1600hrs

C 208, INRIA Rhône-Alpes

Affiliation: INRIA Rhone-Alpes - Project LEAR

Abstract:
In this talk we introduce a new method for extracting shape interest regions which capture the local structure of the contour image. They are in spirit similar to local interest points extracted from grey-level images, but describe the shape instead of the texture. Our approach detects local shape convexities in scale-space. The detection is based on a robust measure, the entropy of the gradient orientations in the neighborhood of a circle defined by the scale. The detected regions allow for clutter, occlusions as well as spurious detections and are invariant to scale changes and rotations. Experimental results show a very good performance for shape matching and recognition of object categories.

R¿sum¿:
Nous pr¿sentons une nouvelle m¿thode pour la d¿tection de zones d'int¿r¿t bas¿e sur la forme, qui capture la structure locale des contours des images. Elle est con¿ue dans le m¿me esprit que les d¿tectueurs de points d'int¿r¿t locaux qui travaillent ¿ partir d'images en niveaux de gris, mais d¿crit la forme plut¿t que la texture. Notre approche d¿crit des convexit¿s locales des formes, dans l'espace des ¿chelles. Les r¿gions sont d¿tect¿es de mani¿re robuste, malgr¿ des occultations, le bruit dans les images ou les changements d'¿chelles. Des r¿sultats exp¿rimentaux montrent de tr¿s bonnes performances lors de mise en correspondance de formes et de reconnaissance de cat¿gories d'objets.

Unsupervised Statistical Models for General Object Recognition

Presenter: Peter Carbonetto

27 November, 2003 at 1530hrs

C 207, INRIA Rhône-Alpes

Affiliation: INRIA Rhône-Alpes - Project LEAR

Abstract:
I will present an overview of the work I did for my Master's thesis at the University of British Columbia. I will also touch upon some major issues I uncovered in my work and discuss some future directions for research.
We approach the object recognition problem as the process of attaching meaningful labels to specific regions of an image. Given a set of images and their captions, we segment the images, then learn the proper associations between words and regions. Previous models are limited by the scope of the representation, and performance is constrained by noise from poor initial clusterings of the image features. We propose three improvements that address these issues.

Releated papers:
1. Bayesian feature weighting for unsupervised learning, with application to object recognition. P. Carbonetto, N. de Freitas, P. Gustafson and N. Thompson. AI-Stats, 2003. PDF
2. Why can't Jose read? The problem of learning semantic associations in a robot environment. P. Carbonetto and N. de Freitas. HLT Conference Workshop on Learning Word Meaning from Non-Linguistic Data, 2003. PDF
3. A Statistical Model for General Contextual Object Recognition. P. Carbonetto, N. de Freitas and K. Barnard. Submitted to ECCV 2004. (local intranet access -- /home/albireo/carbonet/eccv2004.pdf)

Apprentissage Direct de la Matrice Jacobienne Inverse d'une Fonction

Presenter: Frédéric Jurie

6 November, 2003 at 1600hrs

F 107, INRIA Rhône-Alpes

Affiliation:
INRIA Rhône-Alpes - Project LEAR
Also Université Blaise Pascal, Project LASMEA

Abstract:
A method to estimate the inverse Jacobian matrix of of a function, without computing the direct Jacobian matrix is presented. This kind of inverse Jacobian matrix proves to perform much better in modeling a relation $\theta = f^{-1}(x)$ (where parameters $\theta$ are to be computed from observations $x$) than the traditional computation of the Moore-Penrose inverse.

Theoretical insight as well as comparisons in the domain like visual servoing or tracking will be provided to prove the correctness of the assertion.

Résumé:
Une méthode sera présentée qui permettant l'estimation de la matrice Jacobienne inverse d'une fonction, qui n'utilise pas le calcul de la matrice Jacobienne. Ce type de matrice Jacobienne inverse possède des propriétés meilleures, dans des probl¿mes d'inversion (calcul de paramètres d'un modèle à partir de mesures), que la méthode de Moore-Penrose.

Aussi, quelques idées sur les aspects théoriques ainsi que des comparaisons dans diff¿rents domaines d'applications de la vision tels que l'asservissement visuel ou le suivi d'objets seront présentés.

Texture Recognition Using Affine-Invariant Regions

Presenter: Svetlana Lazebnik

23 October, 2003 at 1600hrs

F 107, INRIA Rhône-Alpes

Affiliation: Beckman Institute (University of Illinois at Urbana-Champaign)

Abstract:
This talk will discuss texture representations using affine-invariant interest points. A model of a texture is constructed from a sparse set of image locations characterized by local appearance and affine shape. For more descriptive power, it is possible to incorporate neighborhood constraints based on co-occurrence statistics. Applications include retrieval, classification, and segmentation of images of textured surfaces under a wide range of transformations, including viewpoint changes and non-rigid deformations.

Other links:
Releated papers
Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce, ``Affine-Invariant Local Descriptors and Neighborhood Statistics for Texture Recognition,'' ICCV 2003.
Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce, ``A Sparse Texture Representation Using Affine-Invariant Regions,'' CVPR 2003, vol. II, pp. 319-324.

Méthodes de réduction de dimensionnalité pour le dépliage du ruban cortical

Presenter: Charles Bouveyron

01 October, 2003 at 1600hrs

C 207, INRIA Rhône-Alpes

Affiliation: INRIA Rhône-Alpes - Project LEAR

Other links:
Presentation slides (pdf)
Related article (pdf)

Learning Dyanamical Models for Tracking Complex Motion

Presenter: Ankur Agarwal

18 September, 2003 at 1600hrs

C 207, INRIA Rhône-Alpes

Affiliation: INRIA Rhône-Alpes - Project LEAR

Abstract:
I will address the problem of tracking complex human motions in monocular video sequences. Mainly, I will describe a new approach to modelling the non-linear and time-varying dynamics of generic human motions, using statistical methods to exploit structured motion patterns that exist in typical human activities. The method receives, as input, a set of hand-labelled motion sequences and it learns a piecewise dynamical model based on Gaussian autoregressive processes by automatically constructing connected regions in parameter space that exhibit similar dynamical characteristics. It also automatically partitions the state space into a number of classes corresponding to different motion patterns, making it useful for activity recognition.

The Trade-off Between Generative and Discriminative Classifiers

Presenter: Guillaume Bouchard

04 September, 2003 at 1600hrs

C 207, INRIA Rhône-Alpes

Affiliation: INRIA Rhône-Alpes - Project LEAR

Abstract:
Given any generative classifier based on an inexact density model, we can define a discriminative counterpart that reduces its asymptotic error rate. We introduce a family of parameter estimation problems that interpolates the two approaches, thus providing a new way to compare them and giving an estimation procedure whose classification performance is well balanced between the bias of generative classifiers and the variance of discriminative ones. We show that an intermediate trade-off between the two strategies is often preferable, both theoretically and in experiments on real data.