Update: The website of the 2016-2017 course is available here.
Update 2: see the schedule of the MVA master for 2016-2017.
The information below is only relevant for the previous year 2015-2016.
Lecturers
- Julien Mairal Inria
- Nino Shervashidze Mines ParisTech, as a guest lecturer on graph kernels
- Jean-Philippe Vert Mines ParisTech (on sabbatical at UC Berkeley in 2016)
Course information
Many problems in real-world applications of machine learning can be formalized as classical statistical problems, e.g., pattern recognition, regression or dimension reduction, with the caveat that the data are often not vectors of numbers. For example, protein sequences and structures in computational biology, text and XML documents in web mining, segmented pictures in image processing, or time series in speech recognition and finance, have particular structures which contain relevant information for the statistical problem but can hardly be encoded into finite-dimensional vector representations.
Kernel methods are a class of algorithms well suited for such problems. Indeed they extend the applicability of many statistical methods initially designed for vectors to virtually any type of data, without the need for explicit vectorization of the data. The price to pay for this extension to non-vectors is the need to define a so-called positive definite kernel function between the objects, formally equivalent to an implicit vectorization of the data. The "art" of kernel design for various objects have witnessed important advances in recent years, resulting in many state-of-the-art algorithms and successful applications in many domains.
The goal of this course is to present the mathematical foundations of kernel methods, as well as the main approaches that have emerged so far in kernel design. We will start with a presentation of the theory of positive definite kernels and reproducing kernel Hilbert spaces, which will allow us to introduce several kernel methods including kernel principal component analysis and support vector machines. Then we will come back to the problem of defining the kernel. We will present the main results about Mercer kernels and semigroup kernels, as well as a few examples of kernel for strings and graphs, taken from applications in computational biology, text processing and image analysis.
See the presentation of the course in these slides.
Course material
The slides of the course are available here. They will be regularly updated until the end of the course.References
- N. Aronszajn, "Theory of reproducing kernels", Transactions of the American Mathematical Society, 68:337-404, 1950.
- N. Cristianini and J. Shawe-Taylor, "Kernel Methods for Pattern Analysis", Cambridge University Press, 2004.
- B. Scholkopf et A. Smola, "Learning with kernels", MIT Press, 2002.
- B. Scholkopf, K. Tsuda et J.-P. Vert, "Kernel methods in computational biology", MIT Press, 2004.
- V. Vapnik, "Statistical Learning Theory", Wiley, 1998.
- C. Berg, J.P.R. Christensen et P. Ressel, "Harmonic analysis on semi-groups", Springer, 1994.
Evaluation
- One homework is available here. It will count for 50% of the grade. It can be done by groups of 1 to 3 students, and should be sent by e-mail (a Pdf file in LateX) to julien.mairal@inria.fr. The homework is due on February 17th. A Latex template is available here.
- A data challenge will be organized on the platform Kaggle. It will consists of a simple prediction task on which you need to fight, in teams of 1 to 3 students, to obtain the best prediction with kernel methods. You may use any language of your choice for this task.
- One week after the end of the data challenge (March 22nd), you will need to submit a 2-page pdf report detailing your experiments, along with the source code to reproduce them. The same penalty as for the homework will apply: you lose one point every day you are late.
- Be careful of this rule for 2016: Team members for the homework cannot be in the same team for the data challenge.
- Data Challenge is online
- Please use your family names for the team names. Ex: Team mairal - vert - shervashidze.
- Results for the course are available here.
Course outline
Introduction
- Motivating example applications
- Context of nonlinear models in machine learning: kernel methods and neural networks
- Practical details of the class
Kernels and Reproducing Kernel Hilbert Spaces (RKHS)
- Kernels and positive definiteness
- Reproducing Kernel Hilbert Spaces (RKHS) and Aronszjan theorem
- Simple kernel examples
- Regularizing with RKHS norms
- The kernel trick
Supervised learning with kernels
- The representer theorem
- Kernel ridge regression
- Empirical risk minimization
- A tiny bit of learning theory
- Focus on support vector machines
Unsupervised learning with kernels
- Kernel K-means and spectral clustering
- Kernel PCA
- A quick note on kernel CCA
The kernel jungle
- Mercer kernels and shift-invariant kernels
- Kernels for generative models
- Kernels for biological sequences
- Kernels for graphs
- Kernels on graphs
Open problems and research topics
- Multiple kernel learning
- Large-scale learning with kernels
- "Deep" learning with kernels
Calendar
The courses will take place at ENS Cachan from 1pm to 4pm on the following dates.Date | Lecturer | Topic | Place |
---|---|---|---|
January 20th | JM | Positive definite kernel, RKHS, Aronszajn's theorem, Kernel trick (slides 1-70) |
Amphi Tocqueville |
January 27th | JM | Representer theorem, kernel ridge regression, large-margin classifiers, learning theory, constrained optimization (slides 71-124) |
Amphi Marie Curie |
February 3rd | JM | Support Vector Machines, clustering, kernel PCA (slides 124 - 166) Practical details about the homework |
Amphi Marie Curie |
February 17th | JM |
SMO (not in the slides), kernel CCA, kernels for probabilistic models. (slides 167-196) Additional details about the homework and data challenge |
Amphi Marie Curie |
February 24th | JM |
string kernels, shift-invariant kernels. (slides 197-306) Additional details about the data challenge |
Amphi Marie Curie |
March 9th | Nino Shervashidze | Lecture on graph kernels. (slides 346-413) Additional details about the data challenge |
Amphi Marie Curie |
March 16th | JM |
Mercer kernels, large-scale kernel learning, deep kernel learning. (slides 307-333 and 484-553) Details about the data challenge report |
Amphi Marie Curie |