Course information
Statistical learning is about the construction and study of systems that can automatically learn from data. With the emergence of massive datasets commonly encountered today, the need for powerful machine learning is of acute importance. Examples of successful applications include effective web search, anti-spam software, computer vision, robotics, practical speech recognition, and a deeper understanding of the human genome. This course gives an introduction to this exciting field, with a strong focus on kernels as a versatile tool to represent data, in combination with (un)supervised learning techniques that are agnostic to the type of data that is learned from. The learning techniques that will be covered include regression, classification, clustering and dimension reduction. We will cover both the theoretical underpinnings of kernels, as well as a series of kernels that are important in practical applications.
Evaluation
- For UJF: homeworks 1,2,3 (1/2) + project (1/2)
- For ENSIMAG: homework 1 (1/4) + project (3/4)
Course outline
Introduction
- Motivating example applications
- Empirical risk minimization
- Bias-variance trade-off, and risk bounds
Supervised learning with linear models and kernels
- Risk convexification and regularization
- Ridge regression
- Logistic regression
- Support vector machines
- Kernels for non-linear models
Unsupervised learning
- Principal component analysis
- Data clustering
- Other methods: canonical correlation analysis, sparse coding, etc.
Kernels for probabilistic models
- Fisher kernels
- Probability product kernels
Reading material
Machine Learning and Statistics
- Vapnik, The nature of statistical learning theory. Springer
- Hastie, Tibshirani, Friedman, The elements of statistical learning. (free online)
- Devroye, Gyorfi, Lugosi, A probabilistic theory of pattern recognition. Springer
- J Shawe-Taylor, N Cristianini. Kernel methods for pattern analysis. 2004.
- Bishop, Pattern recognition & machine learning. 2006.
- Slides by Jean-Philippe Vert on kernel methods.
Optimization
- S. Boyd and L. Vandenberghe. Convex Optimization. 2004. (free online)
- D. Bertsekas. Nonlinear Programming. 2003.
Calendar
Date | Room | Lecturer | Topic | Homework |
---|---|---|---|---|
07/10 | H104 | JV | Introduction + Bias-variance tradeoff. slides |
|
14/10 | H201 | JV | Penalized empirical risk minimization, linear classifiers, introduction kernels. slides |
Homework 1 |
21/10 | H201 | JM | Reproducing kernel Hilbert spaces (RKHS) | |
04/11 | H201 | JV | The kernel trick, supervised kernel methods, and Fisher kernels. slides |
Homework 2 |
25/11 | H201 | JM | ||
2/12 | H201 | JM |
Homeworks
There will be three homeworks given during the course, at lecture 2, 4, and 6. Each of them should be returned within three weeks. Either use LateX, or make sure you write very clearly. Homework has to be done individually. ENSIMAG students only have to handin the first homework, since they get less credits for the course.Projects
The project consists of implementing an article, doing some experiments, and writing a small report (less than 10 pages). It is also possible to study a theoretical paper instead of implementing a method. All reports should be written in LateX, and a pdf should be sent to the lecturers before January 5th. Projects can be done alone, or in groups of two people. You can either come with your own idea and discuss it with us, or we can give you some suggestions. To give you an idea, these are projects of a related course.Project | Student(s) | Coach |
---|---|---|
Supervised classification of text documents. material |
Vera Shalaeva and Manon Lukas | |
Predicting Molecular Activity with Graph Kernels. material |
Phivos Valougeorgis | |
Speaker Recognition. material |
Li Liu | |
Supervised classification of Flickr images. material |
Leonardo Gutierrez Gomez | |
Fast string kernels using inexact matching for protein sequences material |
||
Semigroup kernels on measures material |
Julien Alapetite | |
Kernel change-point analysis material |
||
Fast global alignment kernels material |
||
Multiple kernel learning, conic duality, and the SMO algorithm material |
||
Predictive low-rank decomposition for kernel methods material |
||
Image classification with segmentation graph kernels material |
||
Image Classification with the Fisher Vector: Theory and Practice material |
Jerome Lesaint |