Bayesian Methods for Machine Learning (2021/2022)

Course given by Simon Leglaive at CentraleSupélec.

General information

Bayesian modeling, inference and prediction techniques have become commonplace in machine learning. Bayesian models are used in data analysis to describe, through latent factors, the generative process of complex data (e.g. medical images, audio signals, text documents). The discovery of these latent or hidden variables from observations is based on the notion of posterior probability distribution, the calculation of which corresponds to the Bayesian inference step.

The Bayesian machine learning approach has the advantage of being interpretable, and it makes it easy to include expert knowledge through the definition of priors on the latent variables of interest. In addition, it naturally offers uncertainty information about the prediction, which can be particularly important in certain application contexts, such as medical diagnosis or autonomous driving for example.

At the end of the course, you are expected to:

know when it is useful or necessary to use a Bayesian machine learning approach;
have a view of the main approaches in Bayesian modeling and exact or approximate inference;
know how to identify and derive a Bayesian inference algorithm from the definition of a model;
be able to implement standard supervised or unsupervised Bayesian learning methods.

Prerequisites

You are expected to be familiar with basic concepts of probabilities, statistics and machine learning. The 1st-year course "statistics and learning" at CentraleSupélec provides all these requirements.

We will have a session dedicated to the basics of statistical learning, so the most important is that you revise probabilities if you feel like you need to. To do so, you can read Chapter 6 "Probability and distribution" of Mathematics for Machine Learning, by Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong, published by Cambridge University Press.

Bibliography

Most of the concepts that we will see in this course are discussed in the machine learning reference book Pattern Recognition and Machine Learning, by Christopher M. Bishop, Springer, 2006, which is moreover freely available online.

Other useful references are:

Mathematics for Machine Learning, by Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong, Cambridge University Press, 2020. (freely available online)
Machine Learning: A Probabilistic Perspective, by Kevin P. Murphy, MIT Press, 2012. (available at the library).

Agenda

Session 1	Lecture	Fundamentals of Bayesian modeling and inference
Session 2	Lecture	Fundamentals of machine learning
Session 3	Lecture	Bayesian networks and inference in latent variable models
Session 4	Practical	Gaussian mixture model
Session 5	Lecture	Factor Analysis
Session 6	Lecture	Variational inference
Session 7	Practical	Bayesian linear regression
Session 8	Lecture	Markov Chain Monte Carlo
Session 9	Practical	Sparse Bayesian linear regression
Session 10	Lecture	Deep generative models
Session 11	Lecture	Revision and other activities

Fundamentals of Bayesian modeling and inference

Key concepts you should be familiar with at the end of this lecture:

Latent and observed variable
Bayesian modeling and inference
Prior, likelihood, marginal likelihood and posterior
Decision, posterior expected loss
Predictive prior, predictive posterior
Gaussian model with latent mean or variance
Conjugate, non-informative and hierarchical priors

Material:

Fundamentals of machine learning

Key concepts you should be familiar with at the end of this lecture:

Supervised learning
Empirical risk minimization
Underfitting and overfitting
Bias-variance trade-off
Maximum likelihood, maximum a posteriori
Multinomial logistic regression

Material:

Bayesian networks and inference in latent variable models

Key concepts you should be familiar with at the end of this lecture:

Bayesian network (or directed probabilistic graphical model)
Conditional independence
D-separation
Markov blanket
Generative model with latent variables
Evidence lower-bound
Expectation-maximization algorithm

Material:

Gaussian mixture model

This practical session is about the Gaussian mixture model, a generative model used to perform clustering, in an unsupervised fashion.

Material:

Factor Analysis

Key concepts you should be familiar with at the end of this lecture:

Factor analysis generative model
Derivation of the posterior
Derivation of the marginal likelihood
Properties of the multivariate Gaussian distribution
Derivation of an EM algorithm (with continuous latent variables, contrary to the previous session on GMMs) for parameters estimation

Material:

Variational inference

Key concepts you should be familiar with at the end of this lecture:

The problem of intractable posterior
Kullback-Leibler divergence
Variational inference
Mean-field approximation

Material:

Bayesian linear regression

We already discussed about linear regression (polynomial regression) in the second lecture, and we saw that with a standard maximum likelihood approach, we have to carefully choose the degree of the polynomial model in order not to overfit the training data. In Bayesian linear regression, a prior distribution is considered for the weights, which acts as a regularizer and prevents overfitting. Moreover, this Bayesian approach to linear regression naturally provides a measure of uncertainty along with the prediction.

Material:

Markov Chain Monte Carlo

Key concepts you should be familiar with at the end of this lecture:

The Monte Carlo method to approximate expectations
Sampling methods (inverse transform sampling, change of variable, rejection sampling, importance sampling)
Definition of Markov chains
Markov chain Monte Carlo methods

Material:

Slides

Sparse Bayesian linear regression

This practical session is a follow-up of the previous one on Bayesian linear regression. We complexify the prior for the linear regression weights so that exact posterior inference is now intractable and a variational approach has to be developed.

Material:

Deep generative models

Key concepts you should be familiar with at the end of this lecture:

The problem of (deep) generative modeling
Generative model of the variational autoencoder (VAE), a non-linear generalization of factor analysis
VAE inference model
VAE training procedure
Application of VAEs for MNIST image generation

Material:

Revision and other activities

Activity 1: Q&A session

Activity 2: You will find exercises here. You will also find exercices (that were left as homeworks) in the slides of the different lectures.

Activity 3: Reading about sequential data processing with latent-variable models.

"State-space models (SSM) provide a general and flexible methodology for sequential data modelling. They were first introduced in the 1960s, with the seminal work of Kalman and were soon used in the Apollo Project to estimate the trajectory of the spaceships that were bringing men to the moon. Since then, they have become a standard tool for time series analysis in many areas well beyond aerospace engineering. In the machine learning community in particular, they are used as generative models for sequential data, for predictive modelling, state inference and representation learning". Quote from Marco Fraccaro's Ph.D Thesis entitled "Deep Latent Variable Models for Sequential Data" and defended at Technical University of Denmark in 2018.

The Kalman filter and smoother are used to compute the posterior distribution of a sequence of latent vectors (called the states) given an observed sequence of measurement. In this video, a Kalman filter is used to track the latent position of multiple persons over time. The latent state variable in this case is continuous.

When the latent state variable is discrete, the state-space model is called a hidden Markov model (HMM). HMMs were very popular for automatic speech recognition, before the deep learning era. The latent state variable in this context is discrete and corresponds to a phoneme (an elementary unit of speech sound that allows us to distinguish one word from another in a particular language), while the observations are acoustic speech features computed from the audio signal.

Chapter 3 of Marco Fraccaro's Ph.D Thesis available here gives a very nice introduction to state-space models and Kalman filtering. To go a bit further, Chapter 4 introduces deep latent variable models for sequential data processing, using the framework of variational autoencoders.

Acknowledgements

The slides are created using Remark, "A simple, in-browser, Markdown-driven slideshow tool". The template is modified from Marc Lelarge's template used in his (very nice) deep learning course.

I did my best to clearly acknowledge the authors of the ressources that I could have been using to build this course. If you find any missing reference, please contact me.

If you want to reuse some of the materials in this repository, please also indicate where you took it.

If you are not one of my student and you would like to have the solution to the practical works, you can contact me.

Email address: firstname.lastname@centralesupelec.fr

License

GNU Affero General Public License (version 3), see LICENSE.txt.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
session1		session1
session10		session10
session11		session11
session2		session2
session3a		session3a
session3b		session3b
session4		session4
session5		session5
session6		session6
session7		session7
session8		session8
session9		session9
LICENSE.txt		LICENSE.txt
README.md		README.md

License

sleglaive/BayesianML

Folders and files

Latest commit

History

Repository files navigation

Bayesian Methods for Machine Learning (2021/2022)

General information

Prerequisites

Bibliography

Agenda

Fundamentals of Bayesian modeling and inference

Fundamentals of machine learning

Bayesian networks and inference in latent variable models

Gaussian mixture model

Factor Analysis

Variational inference

Bayesian linear regression

Markov Chain Monte Carlo

Sparse Bayesian linear regression

Deep generative models

Revision and other activities

Acknowledgements

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages