# Software

Here's a collection of Matlab scripts available for non-commercial use. Please email for questions & suggestions.

## K-means

A compact and efficient kmeans implementation. 12/7/2011
[Matlab implementation]

## Coordinated Factor Analysis

A MATLAB implementation of the Coordinated Factor Analysis (CFA) model described in my 2006 PAMI paper can be found in the Matlab Toolbox for Dimensionality Reduction by Laurens van der Maaten (thanks!).

## Latent Dirichlet Allocation / Probabilisic Latent Semantic Analysis.

Implementation of (smoothed) LDA and PLSA. Includes the option to fix the word-topic distributions to evaluate the topic distributions for new documents. Contains a number of options to estimate/fix the hyper-parameters alpha and eta, to use point-estimates / Dirichlet estimates of theta and beta (if both are point estimates PLSA is recovered), to have (non-)symmetric priors on theta and beta, etc. Code requires Tom Minka's Lightspeed and FastFit toolboxes. August 23, 2006.
[Matlab implementation] [Blei's paper on LDA]

## Hidden Markov models and mixtures for Binary PCA

This MATLAB code implements Binary PCA, and mixtures and HMM's with Binary PCA components. Like normal PCA, Binary PCA is based finds a low-rank approximation for a given data matrix. In the case of normal PCA, the approximation error is not given by the Frobenius norm of the residual matrix. In the case of Binary PCA, the approximation error is given by the summed log-likelihood of the entries of the data matrix where the likelihood of each entry is given by a Bernoulli distribution whose log-odds parameter is given by corresponding entry in the low-rank matrix. Rather than just binary, the data matrix may also contain scalars in [0,1] in which case a weighted log-likelihood is calculated. May 14, 2007.
[Matlab implementation] [Binary PCA paper by Schein et al.]

## Probabilisic latent semantic analysis.

A compact matlab script performing the EM iterations for PLSA. Includes the option to fix the word-topic distributions to evaluate the topic distributions for new documents. May 14, 2007.
[Matlab implementation] [Hofmann's paper on PLSA]

## Mixture of Factor Analyzers.

Implementation of the Mixture of Factor Analyzers model. Allows setting noise models to be equal for all components and/or to set the noise model to be isotropic. In the latter case the Mixture of Probabilistic Principal Component Analyzers is obtained. October 3, 2005.
[Matlab implementation] [Paper on mixtures of probabilistic PCA] [Paper on mixtures of Factor Analyzers]

## Accelerated Gaussian mixture learning

Standard mixture learning algorithms like EM and k-means are slow for large datasets. For k-means there exists an accelerated version that uses a kd-tree and is exact (Pelleg and Moore, 1999). A similar approximate technique exists for EM (Moore, 1999) but with no convergence guarantees. In our 2006 DMKD paper we present a variational approximation to the EM algorithm for Gaussian mixtures which results in a provably convergent scheme with speedups that are at least linear with the sample size. This code also implements our greedy mixture of Gaussian learning algorithm from the 2003 Neural Computation paper.
[Matlab implementation]
[Data Mining and Knowledge Discovery 2006 paper]
[Neural Computation 2003 paper]

## Self-organizing mixture models

By optimization of free-energy with a constrained EM algorithm we obtain an algorithm very similar to Kohonen's SOM, but which proovavbly converges and optimizes an objective function.
[Matlab implementation] [Neurocomputing paper]

## Probabilistic PCA with missing values

A Matlab script that performs EM to find principal components. Missing data is handeled by using a variational EM algorithm, which allows the algorithm to have runtime linear in number of data, number of data dimensions and number of principal components. The objective function being optimized is a lower-bound on data log-likelihood. Based on "sensible principal components analysis" by Sam Roweis.
If you use this code please cite this paper , in the context of which the code was written.
[Matlab implementation + PDF Note]

## Global k-means

An algorithm for vector quantization that builds the solution by iteratively inserting quantizers.
[Matlab implementation] [Pattern Recognition paper]

## Principal Curves

An algorithm that finds principal curves by fitting a set of local linear models which are combined to form curves.
[Matlab implementation] [Pattern Recognition Letters paper]