Python API

The BinaryClassifier Class

class cyanure.BinaryClassifier(loss='square', penalty='l2', fit_intercept=False)[source]

Bases: cyanure.ERM

The binary classification class, which derives from ERM. The goal is to minimize the following objective

\[\min_{w,b} \frac{1}{n} \sum_{i=1}^n L\left( y_i, w^\top x_i + b\right) + \psi(w),\]

where \(L\) is a classification loss, \(\psi\) is a regularization function (or constraint), \(w\) is a p-dimensional vector representing model parameters, and b is an optional unregularized intercept. We expect binary labels in {-1,+1}.

Parameters
loss: string, default=’square’

Loss function to be used. Possible choices are

  • ‘square’ => \(L(y,z) = \frac{1}{2} ( y-z)^2\)

  • ‘logistic’ => \(L(y,z) = \log(1 + e^{-y z} )\)

  • ‘sqhinge’ or ‘squared_hinge’ => \(L(y,z) = \frac{1}{2} \max( 0, 1- y z)^2\)

  • ‘safe-logistic’ => \(L(y,z) = e^{ yz - 1 } - y z ~\text{if}~ yz \leq 1~~\text{and}~~0\) otherwise

penalty: string, default=’l2’

Regularization function psi. Possible choices are

  • ‘none’ => \(\psi(w) = 0\)

  • ‘l2’ => \(\psi(w) = \frac{\lambda}{2} \|w\|_2^2\)

  • ‘l1’ => \(\psi(w) = \lambda \|w\|_1\)

  • ‘elastic-net’ => \(\psi(w) = \lambda \|w\|_1 + \frac{\lambda_2}{2}\|w\|_2^2\)

  • ‘fused-lasso’ => \(\psi(w) = \lambda \sum_{i=2}^p |w[i]-w[i-1]| + \lambda_2\|w\|_1 + \frac{\lambda_3}{2}\|w\|_2^2\)

  • ‘l1-ball’ => encodes the constraint \(\|w\|_1 \leq \lambda\)

  • ‘l2-ball’ => encodes the constraint \(\|w\|_2 \leq \lambda\)

fit_intercept: boolean, default=’False’

learns an unregularized intercept b

Methods

eval(self, X, y[, lambd, lambd2, lambd3])

get the value of the objective function and computes a relative duality gap, see function fit for the format of parameters.

fit(self, X, y[, lambd, lambd2, lambd3, …])

The fitting function (the one that does the job)

get_weights(self)

get the model parameters (either w or the tuple (w,b))

predict(self, X)

predict the labels given an input matrix X (same format as fit)

score(self, X, y)

Compute classification accuracy of the model for new test data (X,y)

fit(self, X, y, lambd=0, lambd2=0, lambd3=0, solver='auto', tol=0.001, it0=10, max_epochs=500, l_qning=20, f_restart=50, verbose=True, restart=False, nthreads=-1, seed=0)[source]

The fitting function (the one that does the job)

Parameters
X: numpy array, or scipy sparse CSR matrix

input n x p numpy matrix; the samples are on the rows

y: labels, numpy array.
  • vector of size n with {-1,+1} labels for binary classification, which will be automatically converted if labels in {0,1} are provided

lambd: float, default=0

first regularization parameter \(\lambda\)

lambd2: float, default=0

second regularization parameter \(\lambda_2\), if needed

lambd3: float, default=0

third regularization parameter \(\lambda_3\), if needed

solver: string, default=’auto’

Optimization solver. Possible choices are

  • ‘ista’

  • ‘fista’

  • ‘catalyst-ista’

  • ‘qning-ista’ (proximal quasi-Newton method)

  • ‘svrg’

  • ‘catalyst-svrg’ (accelerated SVRG with Catalyst)

  • ‘qning-svrg’ (quasi-Newton SVRG)

  • ‘acc-svrg’ (SVRG with direct acceleration)

  • ‘miso’

  • ‘catalyst-miso’ (accelerated MISO with Catalyst)

  • ‘qning-miso’ (quasi-Newton MISO)

  • ‘auto’

see the Latex documentation for more details. If you are unsure, use ‘auto’

tol: float, default=’1e-3’

Tolerance parameter. For almost all combinations of loss and penalty functions, this parameter is based on a duality gap. Assuming the (non-negative) objective function is \(f\) and its optimal value is \(f^*\), the algorithm stops with the guarantee

\[f(x_t) - f^* \leq tol f(x_t)\]
max_epochs: int, default=500

Maximum number of iteration of the algorithm in terms of passes over the data

it0: int, default=10

Frequency of duality-gap computation

verbose: boolean, default=True

Display information or not

nthreads: int, default=-1

maximum number of cores the method may use (-1 = all cores). Note that more cores is not always better.

seed: int, default=0

random seed

restart: boolean, default=False

use a restart strategy (useful for computing regularization path)

univariate: boolean, default=True

univariate or multivariate problems

l_qning: int, default=20

memory parameter for the qning method

f_restart: int, default=50

restart strategy for fista

Returns
numpy array

information about the optimization process (number of iterations, objective function values, duality gap) will be documented in the future if people ask me.

predict(self, X)[source]

predict the labels given an input matrix X (same format as fit)

score(self, X, y)[source]

Compute classification accuracy of the model for new test data (X,y)

eval(self, X, y, lambd=0, lambd2=0, lambd3=0)

get the value of the objective function and computes a relative duality gap, see function fit for the format of parameters.

get_weights(self)

get the model parameters (either w or the tuple (w,b))

The Regression Class

class cyanure.Regression(loss='square', penalty='l2', fit_intercept=False)[source]

Bases: cyanure.ERM

The regression class. The objective is the same as for the BinaryClassifier class, but we use a regression loss only (see below), and the targets will be real values.

Parameters
loss: string, default=’square’

Only the square loss is implemented at this point

  • ‘square’ => \(L(y,z) = \frac{1}{2} ( y-z)^2\)

penalty: string, default=’l2’

same as for the class BinaryClassifier

fit_intercept: boolean, default=’False’

learns an unregularized intercept b

Methods

eval(self, X, y[, lambd, lambd2, lambd3])

get the value of the objective function and computes a relative duality gap, see function fit for the format of parameters.

fit(self, X, y[, lambd, lambd2, lambd3, …])

The fitting function is the same as for the class BinaryClassifier, except that we do not necessarily expect binary labels in y.

get_weights(self)

get the model parameters (either w or the tuple (w,b))

predict(self, X)

predict the labels given an input matrix X (same format as fit)

fit(self, X, y, lambd=0, lambd2=0, lambd3=0, solver='auto', tol=0.001, it0=10, max_epochs=500, l_qning=20, f_restart=50, verbose=True, restart=False, nthreads=-1, seed=0)[source]

The fitting function is the same as for the class BinaryClassifier, except that we do not necessarily expect binary labels in y.

predict(self, X)[source]

predict the labels given an input matrix X (same format as fit)

eval(self, X, y, lambd=0, lambd2=0, lambd3=0)

get the value of the objective function and computes a relative duality gap, see function fit for the format of parameters.

get_weights(self)

get the model parameters (either w or the tuple (w,b))

The MultiClassifier Class

class cyanure.MultiClassifier(loss='square', penalty='l2', fit_intercept=False)[source]

Bases: cyanure.ERM

The multi-class classification class. The goal is to minimize the following objective

\[\min_{W,b} \frac{1}{n} \sum_{i=1}^n L\left( y_i, W^\top x_i + b\right) + \psi(W),\]

where \(L\) is a classification loss, \(\psi\) is a regularization function (or constraint), \(W=[w_1,\ldots,w_k]\) is a (p x k) matrix that carries the k predictors, where k is the number of classes, and \(y_i\) is a label in \(\{1,\ldots,k\}\). b is a k-dimensional vector representing an unregularized intercept (which is optional).

Parameters
loss: string, default=’square’

Loss function to be used. Possible choices are

  • any loss function compatible with the class BinaryClassifier e.g.

    (‘square’, ‘logistic’, ‘sqhinge’, ‘safe-logistic’). In such a case, the loss function encodes a one vs. all strategy based on the chosen binary-classification loss.

  • ‘multiclass-logistic’, which is also called multinomial or

    softmax logistic:

\[L(y, W^\top x + b) = \sum_{j=1}^k \log\left(e^{w_j^\top + b_j} - e^{w_y^\top + b_y} \right)\]
penalty: string, default=’l2’

Regularization function psi. Possible choices are

  • any penalty function compatible with the class BinaryClassifier such as (‘none’, ‘l2’, ‘l1’, ‘elastic-net’, ‘fused-lasso’, ‘l1-ball’, ‘l2-ball’). In such a case,. the penalty is applied on each predictor

    \(w_j\) individually:

\[\psi(W) = \sum_{j=1}^k \psi(w_j).\]
  • ‘l1l2’, which is the multi-task group Lasso regularization

\[\psi(W) = \lambda \sum_{j=1}^p \|W^j\|_2~~~~ \text{where}~W^j~\text{is the j-th row of}~W.\]
  • ‘l1linf’

\[\psi(W) = \lambda \sum_{j=1}^p \|W^j\|_\infty.\]
  • ‘l1l2+l1’, which is the multi-task group Lasso regularization + l1

\[\psi(W) = \sum_{j=1}^p \lambda \|W^j\|_2 + \lambda_2 \|W^j\|_1 ~~~~ \text{where}~W^j~\text{is the j-th row of}~W.\]
fit_intercept: boolean, default=’False’

learns an unregularized intercept b, which is a k-dimensional vector

Methods

eval(self, X, y[, lambd, lambd2, lambd3])

get the value of the objective function and computes a relative duality gap, see function fit for the format of parameters.

fit(self, X, y[, lambd, lambd2, lambd3, …])

Same as BinaryClassifier, but y should be a vector a n-dimensional vector of integers

get_weights(self)

get the model parameters (either w or the tuple (w,b))

predict(self, X)

Predicts the class label

score(self, X, y)

Gives a classification score on new test data

fit(self, X, y, lambd=0, lambd2=0, lambd3=0, solver='auto', tol=0.001, it0=10, max_epochs=500, l_qning=20, f_restart=50, verbose=True, restart=False, nthreads=-1, seed=0)[source]

Same as BinaryClassifier, but y should be a vector a n-dimensional vector of integers

predict(self, X)[source]

Predicts the class label

score(self, X, y)[source]

Gives a classification score on new test data

eval(self, X, y, lambd=0, lambd2=0, lambd3=0)

get the value of the objective function and computes a relative duality gap, see function fit for the format of parameters.

get_weights(self)

get the model parameters (either w or the tuple (w,b))

The MultiVariateRegression Class

class cyanure.MultiVariateRegression(loss='square', penalty='l2', fit_intercept=False)[source]

Bases: cyanure.ERM

The multivariate regression class. The objective is the same as for the MultiClassifier class, but we use a regression loss only (see below), and the targets \(y_i\) are k-dimensional vectors.

Parameters
loss: string, default=’square’

Only the square loss is implemented at this point. Given two k-dimensional vectors y,z:

  • ‘square’ => \(L(y,z) = \frac{1}{2} \|y-z\|^2\)

penalty: string, default=’l2’

same as for the class MultiClassifier

fit_intercept: boolean, default=’False’

learns an unregularized intercept b

Methods

eval(self, X, y[, lambd, lambd2, lambd3])

get the value of the objective function and computes a relative duality gap, see function fit for the format of parameters.

fit(self, X, y[, lambd, lambd2, lambd3, …])

Same as ERM.fit, but y should be n x k, where k is size of the target for each data point

get_weights(self)

get the model parameters (either w or the tuple (w,b))

predict(self, X)

Predicts the targets

fit(self, X, y, lambd=0, lambd2=0, lambd3=0, solver='auto', tol=0.001, it0=10, max_epochs=500, l_qning=20, f_restart=50, verbose=True, restart=False, nthreads=-1, seed=0)[source]

Same as ERM.fit, but y should be n x k, where k is size of the target for each data point

predict(self, X)[source]

Predicts the targets

eval(self, X, y, lambd=0, lambd2=0, lambd3=0)

get the value of the objective function and computes a relative duality gap, see function fit for the format of parameters.

get_weights(self)

get the model parameters (either w or the tuple (w,b))

Scikit-learn compatible classes

class cyanure.LinearSVC(loss='sqhinge', penalty='l2', fit_intercept=False, C=1, max_iter=500)[source]

Bases: cyanure.BinaryClassifier

A compatibility class for scikit-learn user, but only for square hinge loss It is perfectly equivalent to the BinaryClassifier class, but the regularization parameter (here “C”) is provided during the class initialization. Note that \(C= \frac{1}{2n \lambda}\)

Parameters
loss: should be ‘sqhinge’ or ‘squared_hinge’
penalty: same as BinaryClassifier
fit_intercept: same as BinaryClassifier
C: regularization parameter
max_iter: maximum number of iterations for the optimization solver

Methods

eval(self, X, y[, lambd, lambd2, lambd3])

get the value of the objective function and computes a relative duality gap, see function fit for the format of parameters.

fit(self, X, y[, C, verbose, lambd2, …])

Same as BinaryClassification, but the parameter C replaces lambd, and max_iter replaces max_epochs.

get_weights(self)

get the model parameters (either w or the tuple (w,b))

predict(self, X)

predict the labels given an input matrix X (same format as fit)

score(self, X, y)

Compute classification accuracy of the model for new test data (X,y)

fit(self, X, y, C=None, verbose=None, lambd2=0, lambd3=0, solver='auto', tol=0.001, it0=10, max_iter=None, l_qning=20, f_restart=50, restart=False, nthreads=-1, seed=0)[source]

Same as BinaryClassification, but the parameter C replaces lambd, and max_iter replaces max_epochs.

eval(self, X, y, lambd=0, lambd2=0, lambd3=0)

get the value of the objective function and computes a relative duality gap, see function fit for the format of parameters.

get_weights(self)

get the model parameters (either w or the tuple (w,b))

predict(self, X)

predict the labels given an input matrix X (same format as fit)

score(self, X, y)

Compute classification accuracy of the model for new test data (X,y)

class cyanure.LogisticRegression(penalty='l2', fit_intercept=False, C=1, max_iter=500)[source]

Bases: cyanure.BinaryClassifier

A compatibility class for scikit-learn user, but only for square hinge loss It is perfectly equivalent to the BinaryClassifier class, but the regularization parameter (here “C”) is provided during the class initialization. Note that \(C= \frac{1}{n \lambda}\)

Parameters
loss: should be ‘sqhinge’ or ‘squared_hinge’
penalty: same as BinaryClassifier
fit_intercept: same as BinaryClassifier
C: regularization parameter
max_iter: maximum number of iterations for the optimization solver

Methods

eval(self, X, y[, lambd, lambd2, lambd3])

get the value of the objective function and computes a relative duality gap, see function fit for the format of parameters.

fit(self, X, y[, C, lambd2, lambd3, solver, …])

Same as BinaryClassification, but the parameter C replaces lambd, and max_iter replaces max_epochs.

get_weights(self)

get the model parameters (either w or the tuple (w,b))

predict(self, X)

predict the labels given an input matrix X (same format as fit)

score(self, X, y)

Compute classification accuracy of the model for new test data (X,y)

eval(self, X, y, lambd=0, lambd2=0, lambd3=0)

get the value of the objective function and computes a relative duality gap, see function fit for the format of parameters.

fit(self, X, y, C=None, lambd2=0, lambd3=0, solver='auto', tol=0.001, it0=10, max_iter=None, l_qning=20, f_restart=50, verbose=None, restart=False, nthreads=-1, seed=0)[source]

Same as BinaryClassification, but the parameter C replaces lambd, and max_iter replaces max_epochs.

get_weights(self)

get the model parameters (either w or the tuple (w,b))

predict(self, X)

predict the labels given an input matrix X (same format as fit)

score(self, X, y)

Compute classification accuracy of the model for new test data (X,y)