SOLARIS workshop and PhD defenses of Alberto Bietti and Nikita Dvornik

November 26th and 27th, 2019


Grand Amphithéâtre, Inria Grenoble - Rhône-Alpes (Montbonnot/Inovallée site: Directions)


No registration. Entrance is free.


Nathalie Gillot and Julien Mairal (

Program - Nov 26th

9:15 - 9:45Room open + coffee
9:45 - 10:30   Diane Larlus (Naver Labs)
Self-supervised learning for category-level geometry estimation
Abstract [slides]
Self-supervision can dramatically cut back the amount of manually-labelled data required to train deep neural networks. While self-supervision has usually been considered for tasks such as image classification, it has been seldomly used for geometry estimation. The first part of this presentation focuses on learning geometrically stable features for semantic object matching. In this work, we aim at extending self-supervised learning to geometry-oriented tasks such as semantic matching and part detection. Our approach learns dense distinctive visual descriptors from an unlabeled dataset of images using synthetic image transformations. It does so by means of a robust probabilistic formulation that can introspectively determine which image regions are likely to result in stable image matching. The second part focuses on learning 3D object categories. Unlike traditional approaches which use either synthetic data or manual supervision, we propose a method which is cued by observing objects from a moving vantage point. Our system builds on two innovations: a Siamese viewpoint factorization network that robustly aligns different videos together without explicitly comparing 3D shapes; and a 3D shape completion network that can extract the full shape of an object from partial observations. Once learned, the full network can predict i) the viewpoint, ii) the depth, and iii) a point cloud, all from a single image of a new object instance.
10:30 - 11:15   Vincent Lepetit (Univ. Bordeaux)
3D Scene Understanding from a Single Image
Abstract [slides]
3D scene understanding is a long standing, fundamental problem in Computer Vision. In this talk, I will present several works we very recently developed for 3D scene understanding. The first work is a method for 3D object recognition and pose estimation based on a feedback loop inspired by biological mechanisms, and providing very accurate and reliable results. The second work is a method for understanding the 3D layout (walls, floor, ceiling, ..) of an indoor environment from a single image despite possible occlusions by furnitures. I will then discuss the challenges in creating training and evaluation data for 3D registration problems, and present the direction we are currently exploring.
11:15 - 12:00   Stefanie Wuhrer (Inria)
Building Decoupled Generative 3D Face Models
Abstract [slides]
Data-driven generative 3D face models are used to compactly encode facial shape data into meaningful parametric representations. A desirable property of these models is their ability to effectively decouple natural sources of variation, in particular identity and expression. I will present two of our recent works for this task. The first one builds a multilinear autoencoder that combines a convolutional neural network-based encoder with a multilinear model-based decoder, therefore taking advantage of both the convolutional network robustness to corrupted and incomplete data, and of the multilinear model capacity to effectively model and decouple shape variations. The second one explores the use of adversarial training to learn decoupled 3D face shape models. To train this model we propose an architecture that combines a 3D generator with a 2D discriminator that leverages conventional CNNs, where the two components are bridged by a geometry mapping layer.
12:00 - 14:00Lunch on your own
14:00 - 17:00   PhD defense of Nikita Dvornik (Inria)
Learning with Limited Annotated Data for Visual Understanding

Reviewers: Martial Hebert (CMU) and Andrea Vedaldi (Univ. Oxford)
Examinateurs: Naila Murray (Naver Labs) and Vincent Lepetit (Univ. Bordeaux)
PhD advisors: Cordelia Schmid (Inria) and Julien Mairal (Inria)
Abstract [slides]

Program - Nov 27th

9:15 - 9:45Room open + coffee
9:45 - 10:30   Lorenzo Rosasco (Univ. Genova/MIT)
Implicit regularization and acceleration in machine learning
Abstract [slides]
The focus on optimization is a major trend in modern machine learning. However, most optimization guarantees focus on the training error, ignoring the performance at test time which is the real goal in machine learning. In this talk, take steps to fill this gap in the context of least squares learning and analyze the learning (test) performance of accelerated gradient methods. In particular, we discuss the influence of different learning assumptions on the learning curves.
10:30 - 11:15   Jean-Philippe Vert (Google/Mines ParisTech)
Learning from ranks, learning to rank
Abstract [slides]
Permutations and sorting operators are ubiquitous in data science, e.g., when one wants to analyze or predict preferences. As discrete combinatorial objects, permutations do not lend themselves easily to differential calculus, which underpins much of modern machine learning. In this talk I will present several approaches to embed permutations to a continuous space, on the one hand, and to relax the ranking operator to be differentiable, on the other hand, in order to integrate permutations, sorting and ranking operators in differentiable architecture for machine learning.
11:15 - 12:00   Stephane Mallat (ENS/College de France)
Multiscale Models and Dictionary Learning for Image Classification and Synthesis with CNN
Abstract [slides]
Deep neural networks outperform predefined representations or kernel for complex image classification or image synthesis. For image classification as well as image synthesis, we will show that learning can be reduced to learning a single dictionary over a multiscale scattering representation which linearizes the action of geometric groups including translations, rotations and deformations. The resulting networks outperforms AlexNet on ImageNet. The same principles appliy to model autoencoders, to synthesize complex images. This opens the door to much simpler models to analyze the performance of deep neural networks.
12:00 - 13:30Lunch on your own
13:30 - 16:30   PhD defense of Alberto Bietti (Inria)
Foundations of deep convolutional models through kernel methods

Reviewers: Stephane Mallat (ENS/College de France) and Lorenzo Rosasco (Univ. Genova/MIT)
Examinateurs: Florence D'Alche-Buc (Telecom ParisTech), Jean-Philippe Vert (Google/Mines ParisTech), Joan Bruna (NYU)
PhD advisor: Julien Mairal (Inria)
Abstract [slides]
macaron  solaris  solaris