Update: The website of the 2016-2017 course is available here.

Update 2: see the schedule of the MVA master for 2016-2017.

The information below is only relevant for the previous year 2015-2016.

Lecturers

Julien Mairal Inria
Nino Shervashidze Mines ParisTech, as a guest lecturer on graph kernels
Jean-Philippe Vert Mines ParisTech (on sabbatical at UC Berkeley in 2016)

Course information

Many problems in real-world applications of machine learning can be formalized as classical statistical problems, e.g., pattern recognition, regression or dimension reduction, with the caveat that the data are often not vectors of numbers. For example, protein sequences and structures in computational biology, text and XML documents in web mining, segmented pictures in image processing, or time series in speech recognition and finance, have particular structures which contain relevant information for the statistical problem but can hardly be encoded into finite-dimensional vector representations.

Kernel methods are a class of algorithms well suited for such problems. Indeed they extend the applicability of many statistical methods initially designed for vectors to virtually any type of data, without the need for explicit vectorization of the data. The price to pay for this extension to non-vectors is the need to define a so-called positive definite kernel function between the objects, formally equivalent to an implicit vectorization of the data. The "art" of kernel design for various objects have witnessed important advances in recent years, resulting in many state-of-the-art algorithms and successful applications in many domains.

The goal of this course is to present the mathematical foundations of kernel methods, as well as the main approaches that have emerged so far in kernel design. We will start with a presentation of the theory of positive definite kernels and reproducing kernel Hilbert spaces, which will allow us to introduce several kernel methods including kernel principal component analysis and support vector machines. Then we will come back to the problem of defining the kernel. We will present the main results about Mercer kernels and semigroup kernels, as well as a few examples of kernel for strings and graphs, taken from applications in computational biology, text processing and image analysis.

See the presentation of the course in these slides.

Course material

The slides of the course are available here. They will be regularly updated until the end of the course.

References

N. Aronszajn, "Theory of reproducing kernels", Transactions of the American Mathematical Society, 68:337-404, 1950.
N. Cristianini and J. Shawe-Taylor, "Kernel Methods for Pattern Analysis", Cambridge University Press, 2004.
B. Scholkopf et A. Smola, "Learning with kernels", MIT Press, 2002.
B. Scholkopf, K. Tsuda et J.-P. Vert, "Kernel methods in computational biology", MIT Press, 2004.
V. Vapnik, "Statistical Learning Theory", Wiley, 1998.
C. Berg, J.P.R. Christensen et P. Ressel, "Harmonic analysis on semi-groups", Springer, 1994.

Evaluation

One homework is available here. It will count for 50% of the grade. It can be done by groups of 1 to 3 students, and should be sent by e-mail (a Pdf file in LateX) to julien.mairal@inria.fr. The homework is due on February 17th. A Latex template is available here.
A data challenge will be organized on the platform Kaggle. It will consists of a simple prediction task on which you need to fight, in teams of 1 to 3 students, to obtain the best prediction with kernel methods. You may use any language of your choice for this task.
One week after the end of the data challenge (March 22nd), you will need to submit a 2-page pdf report detailing your experiments, along with the source code to reproduce them. The same penalty as for the homework will apply: you lose one point every day you are late.
Be careful of this rule for 2016: Team members for the homework cannot be in the same team for the data challenge.
Data Challenge is online
Please use your family names for the team names. Ex: Team mairal - vert - shervashidze.

Results for the course are available here.

Course outline

Introduction

Motivating example applications
Context of nonlinear models in machine learning: kernel methods and neural networks
Practical details of the class

Kernels and Reproducing Kernel Hilbert Spaces (RKHS)

Kernels and positive definiteness
Reproducing Kernel Hilbert Spaces (RKHS) and Aronszjan theorem
Simple kernel examples
Regularizing with RKHS norms
The kernel trick

Supervised learning with kernels

The representer theorem
Kernel ridge regression
Empirical risk minimization
A tiny bit of learning theory
Focus on support vector machines

Unsupervised learning with kernels

Kernel K-means and spectral clustering
Kernel PCA
A quick note on kernel CCA

The kernel jungle

Mercer kernels and shift-invariant kernels
Kernels for generative models
Kernels for biological sequences
Kernels for graphs
Kernels on graphs

Open problems and research topics

Multiple kernel learning
Large-scale learning with kernels
"Deep" learning with kernels

Calendar

The courses will take place at ENS Cachan from 1pm to 4pm on the following dates.

Date	Lecturer	Topic	Place
January 20th	JM	Positive definite kernel, RKHS, Aronszajn's theorem, Kernel trick (slides 1-70)	Amphi Tocqueville
January 27th	JM	Representer theorem, kernel ridge regression, large-margin classifiers, learning theory, constrained optimization (slides 71-124)	Amphi Marie Curie
February 3rd	JM	Support Vector Machines, clustering, kernel PCA (slides 124 - 166) Practical details about the homework	Amphi Marie Curie
February 17th	JM	SMO (not in the slides), kernel CCA, kernels for probabilistic models. (slides 167-196) Additional details about the homework and data challenge	Amphi Marie Curie
February 24th	JM	string kernels, shift-invariant kernels. (slides 197-306) Additional details about the data challenge	Amphi Marie Curie
March 9th	Nino Shervashidze	Lecture on graph kernels. (slides 346-413) Additional details about the data challenge	Amphi Marie Curie
March 16th	JM	Mercer kernels, large-scale kernel learning, deep kernel learning. (slides 307-333 and 484-553) Details about the data challenge report	Amphi Marie Curie

Jobs / Internships Opportunities

We have different intern/PhD opportunities in machine learning, image processing, bioinformatics and computer vision.

Kernel Methods for Machine Learning