Mixed Quadratic Programming

class PyPruning.MIQPPruningClassifier.MIQPPruningClassifier(n_estimators=5, single_metric=None, pairwise_metric=<function combined_error>, alpha=1, verbose=False, n_jobs=8, single_metric_options=None, pairwise_metric_options=None)

Bases: PyPruning.PruningClassifier.PruningClassifier

Mixed Integer Quadratic Programming (MIQP) Pruning.

This pruning method constructs a MIQP so that its solution is the pruned ensemble. Formally, it uses the problem

\[\arg\min_w (1 - \alpha ) q^T w + \alpha w^T P w\]

where \(\alpha \in [0,1]\) is the trade-off between the first and the second term. The first vector q contains the individual metrics for each classifier similar to what a RankPruningClassifier would compute, whereas P contains pairwise metrics for each classifier pair in the ensemble. To compute \(q\) and \(P\) there are two metrics required:

Single_metric

This metric assigns a value to each individual classifier in the ensemble without considering pairs of classifier. A single_metric function should accept the following parameters:

i (int): The classifier which should be rated
ensemble_proba (A (M, N, C) matrix ): All N predictions of all M classifier in the entire ensemble for all C classes
target (list / array): A list / array of class targets.

The single_metric is compatible with the metrics for a RankPruningClassifier. You can use any metric from the RankPruningClassifier here and vice-versa

Pairwise_metric

This metric assigns a value to each pair of classifiers in the ensemble. A pairwise_metric function should accept the following parameters:

i (int): The first classifier in the pair
j (int): The second classifier in the pair
ensemble_proba (A (M, N, C) matrix ): All N predictions of all M classifier in the entire ensemble for all C classes
target (list / array): A list / array of class targets.

If you set alpha = 0 or choose the pairwise metric that simply returns 0 a MIQPPruningClassifier should produce the same solution as a RankPruningClassifier does.

Important: All metrics are minimized. If you implement your own metric make sure that it assigns smaller values to better classifiers.

This code uses cvxpy to access a wide variety of MQIP solver. For more information on how to configure your solver and interpret its output in case of failures please have a look at the cvxpy documentation https://www.cvxpy.org/tutorial/advanced/index.html#solve-method-options.

n_estimators

The number of estimators which should be selected.

Type: int, default is 5

single_metric

A function that assigns a value to each classifier which forms the q vector

Type: function, default is None

pairwise_metric

A function that assigns a value to each pair of classifiers which forms the P matrix

Type: function, default is combined_error

alpha

The trade-off between the single and pairwise metric. alpha = 0 only considers the single_metric, whereas alpha = 1 only considers the pairwise metric

Type: float, must be in [0,1]

verbose

If true, more information from the MQIP solver is printed.

Type: boolean, default is False

n_jobs

The number of threads used for computing the metrics. This does not have any effect on the number of threads used by the MQIP solver.

Type: int, default is 8

prune_(proba, target, data=None)

Prunes the ensemble using the ensemble predictions proba and the pruning data targets / data. If the pruning method requires access to the original ensemble members you can access these via self.estimators_. Note that self.estimators_ is already a deep-copy of the estimators so you are also free to change the estimators in this list if you want to.

Parameters

proba (numpy matrix) – A (N,M,C) matrix which contains the individual predictions of each ensemble member on the pruning data. Each ensemble prediction is generated via predict_proba. N is size of the pruning data, M the size of the base ensemble and C is the number of classes
target (numpy array of ints) – A numpy array or list of N integers where each integer represents the class for each example. Classes should start with 0, so that for C classes the integer 0,1,…,C-1 are used
data (numpy matrix, optional) – The data points in a (N, M) matrix on which the proba has been computed, where N is the pruning set size and M is the number of classifier in the original ensemble. This can be used by a pruning method if required, but most methods do not require the actual data points but only the individual predictions.

Returns

A tuple of indices and weights (idx, weights) with the following properties
idx (numpy array / list of ints) – A list of integers which classifier should be selected from self.estimators_. Any changes made to self.estimators_ are also reflected here, so make sure that the order of classifier in proba and self.estimators_ remains the same (or you return idx accordingly)
weights (numpy array / list of floats) – The individual weights for each selected classifier. The size of this array should match the size of idx (and not the size of the original base ensemble).

PyPruning.MIQPPruningClassifier.combined(i, j, ensemble_proba, target, weights=[0.2, 0.2, 0.2, 0.2, 0.2])

Computes a (weighted) combination of 5 different measures for a pair of classifiers. The original paper also optimizes the weights of this combination using an evolutionary approach and cross-validation. Per default, we use equal weights here. You can supply a different weights via the metric_options parameter of MIQPPruningClassifier.

Reference:: Cavalcanti, G. D. C., Oliveira, L. S., Moura, T. J. M., & Carvalho, G. V. (2016). Combining diversity measures for ensemble pruning. Pattern Recognition Letters, 74, 38–45. https://doi.org/10.1016/j.patrec.2016.01.029

PyPruning.MIQPPruningClassifier.combined_error(i, j, ensemble_proba, target)

Computes the pairwise errors of the two classifiers i and j.

Reference:: Zhang, Y., Burer, S., & Street, W. N. (2006). Ensemble Pruning Via Semi-definite Programming. Journal of Machine Learning Research, 7, 1315–1338. https://doi.org/10.1016/j.jasms.2006.06.007