Course information
Statistical learning is about the construction and study of systems that can automatically learn from data. With the emergence of massive datasets commonly encountered today, the need for powerful machine learning is of acute importance. Examples of successful applications include effective web search, antispam software, computer vision, robotics, practical speech recognition, and a deeper understanding of the human genome. This course gives an introduction to this exciting field, with a strong focus on kernels as a versatile tool to represent data, in combination with (un)supervised learning techniques that are agnostic to the type of data that is learned from. The learning techniques that will be covered include regression, classification, clustering and dimension reduction. We will cover both the theoretical underpinnings of kernels, as well as a series of kernels that are important in practical applications.
Evaluation
 For UJF: homeworks 1,2,3 (1/2) + project (1/2)
 For ENSIMAG: homework 1 (1/4) + project (3/4)
Course outline
Introduction
 Motivating example applications
 Empirical risk minimization
 Biasvariance tradeoff, and risk bounds
Supervised learning with linear models and kernels
 Risk convexification and regularization
 Ridge regression
 Logistic regression
 Support vector machines
 Kernels for nonlinear models
Unsupervised learning
 Principal component analysis
 Data clustering
 Other methods: canonical correlation analysis, sparse coding, etc.
Kernels for probabilistic models
 Fisher kernels
 Probability product kernels
Reading material
Machine Learning and Statistics
 Vapnik, The nature of statistical learning theory. Springer
 Hastie, Tibshirani, Friedman, The elements of statistical learning. (free online)
 Devroye, Gyorfi, Lugosi, A probabilistic theory of pattern recognition. Springer
 J ShaweTaylor, N Cristianini. Kernel methods for pattern analysis. 2004.
 Bishop, Pattern recognition & machine learning. 2006.
 Slides by JeanPhilippe Vert on kernel methods.
Optimization
 S. Boyd and L. Vandenberghe. Convex Optimization. 2004. (free online)
 D. Bertsekas. Nonlinear Programming. 2003.
Calendar
Date  Room  Lecturer  Topic  Homework 

07/10  H104  JV  Introduction + Biasvariance tradeoff. slides 

14/10  H201  JV  Penalized empirical risk minimization, linear classifiers, introduction kernels. slides 
Homework 1 
21/10  H201  JM  Reproducing kernel Hilbert spaces (RKHS)  
04/11  H201  JV  The kernel trick, supervised kernel methods, and Fisher kernels. slides 
Homework 2 
25/11  H201  JM  
2/12  H201  JM 
Homeworks
There will be three homeworks given during the course, at lecture 2, 4, and 6. Each of them should be returned within three weeks. Either use LateX, or make sure you write very clearly. Homework has to be done individually. ENSIMAG students only have to handin the first homework, since they get less credits for the course.Projects
The project consists of implementing an article, doing some experiments, and writing a small report (less than 10 pages). It is also possible to study a theoretical paper instead of implementing a method. All reports should be written in LateX, and a pdf should be sent to the lecturers before January 5th. Projects can be done alone, or in groups of two people. You can either come with your own idea and discuss it with us, or we can give you some suggestions. To give you an idea, these are projects of a related course.Project  Student(s)  Coach 

Supervised classification of text documents. material 
Vera Shalaeva and Manon Lukas  
Predicting Molecular Activity with Graph Kernels. material 
Phivos Valougeorgis  
Speaker Recognition. material 
Li Liu  
Supervised classification of Flickr images. material 
Leonardo Gutierrez Gomez  
Fast string kernels using inexact matching for protein sequences material 

Semigroup kernels on measures material 
Julien Alapetite  
Kernel changepoint analysis material 

Fast global alignment kernels material 

Multiple kernel learning, conic duality, and the SMO algorithm material 

Predictive lowrank decomposition for kernel methods material 

Image classification with segmentation graph kernels material 

Image Classification with the Fisher Vector: Theory and Practice material 
Jerome Lesaint 