Mixed Quadratic Programming
- class PyPruning.MIQPPruningClassifier.MIQPPruningClassifier(n_estimators=5, single_metric=None, pairwise_metric=<function combined_error>, alpha=1, verbose=False, n_jobs=8, single_metric_options=None, pairwise_metric_options=None)
Bases:
PyPruning.PruningClassifier.PruningClassifier
Mixed Integer Quadratic Programming (MIQP) Pruning.
This pruning method constructs a MIQP so that its solution is the pruned ensemble. Formally, it uses the problem
\[\arg\min_w (1 - \alpha ) q^T w + \alpha w^T P w\]where \(\alpha \in [0,1]\) is the trade-off between the first and the second term. The first vector q contains the individual metrics for each classifier similar to what a RankPruningClassifier would compute, whereas P contains pairwise metrics for each classifier pair in the ensemble. To compute \(q\) and \(P\) there are two metrics required:
Single_metric
This metric assigns a value to each individual classifier in the ensemble without considering pairs of classifier. A single_metric function should accept the following parameters:
i
(int): The classifier which should be ratedensemble_proba
(A (M, N, C) matrix ): All N predictions of all M classifier in the entire ensemble for all C classestarget
(list / array): A list / array of class targets.
The single_metric is compatible with the metrics for a
RankPruningClassifier
. You can use any metric from theRankPruningClassifier
here and vice-versaPairwise_metric
This metric assigns a value to each pair of classifiers in the ensemble. A pairwise_metric function should accept the following parameters:
i
(int): The first classifier in the pairj
(int): The second classifier in the pairensemble_proba
(A (M, N, C) matrix ): All N predictions of all M classifier in the entire ensemble for all C classestarget
(list / array): A list / array of class targets.
If you set
alpha = 0
or choose the pairwise metric that simply returns 0 a MIQPPruningClassifier should produce the same solution as a RankPruningClassifier does.Important: All metrics are minimized. If you implement your own metric make sure that it assigns smaller values to better classifiers.
This code uses
cvxpy
to access a wide variety of MQIP solver. For more information on how to configure your solver and interpret its output in case of failures please have a look at the cvxpy documentation https://www.cvxpy.org/tutorial/advanced/index.html#solve-method-options.- n_estimators
The number of estimators which should be selected.
- Type
int, default is 5
- single_metric
A function that assigns a value to each classifier which forms the q vector
- Type
function, default is None
- pairwise_metric
A function that assigns a value to each pair of classifiers which forms the P matrix
- Type
function, default is combined_error
- alpha
The trade-off between the single and pairwise metric. alpha = 0 only considers the single_metric, whereas alpha = 1 only considers the pairwise metric
- Type
float, must be in [0,1]
- verbose
If true, more information from the MQIP solver is printed.
- Type
boolean, default is False
- n_jobs
The number of threads used for computing the metrics. This does not have any effect on the number of threads used by the MQIP solver.
- Type
int, default is 8
- prune_(proba, target, data=None)
Prunes the ensemble using the ensemble predictions proba and the pruning data targets / data. If the pruning method requires access to the original ensemble members you can access these via self.estimators_. Note that self.estimators_ is already a deep-copy of the estimators so you are also free to change the estimators in this list if you want to.
- Parameters
proba (numpy matrix) – A (N,M,C) matrix which contains the individual predictions of each ensemble member on the pruning data. Each ensemble prediction is generated via predict_proba. N is size of the pruning data, M the size of the base ensemble and C is the number of classes
target (numpy array of ints) – A numpy array or list of N integers where each integer represents the class for each example. Classes should start with 0, so that for C classes the integer 0,1,…,C-1 are used
data (numpy matrix, optional) – The data points in a (N, M) matrix on which the proba has been computed, where N is the pruning set size and M is the number of classifier in the original ensemble. This can be used by a pruning method if required, but most methods do not require the actual data points but only the individual predictions.
- Returns
A tuple of indices and weights (idx, weights) with the following properties
idx (numpy array / list of ints) – A list of integers which classifier should be selected from self.estimators_. Any changes made to self.estimators_ are also reflected here, so make sure that the order of classifier in proba and self.estimators_ remains the same (or you return idx accordingly)
weights (numpy array / list of floats) – The individual weights for each selected classifier. The size of this array should match the size of idx (and not the size of the original base ensemble).
- PyPruning.MIQPPruningClassifier.combined(i, j, ensemble_proba, target, weights=[0.2, 0.2, 0.2, 0.2, 0.2])
Computes a (weighted) combination of 5 different measures for a pair of classifiers. The original paper also optimizes the weights of this combination using an evolutionary approach and cross-validation. Per default, we use equal weights here. You can supply a different
weights
via themetric_options
parameter ofMIQPPruningClassifier
.- Reference:
Cavalcanti, G. D. C., Oliveira, L. S., Moura, T. J. M., & Carvalho, G. V. (2016). Combining diversity measures for ensemble pruning. Pattern Recognition Letters, 74, 38–45. https://doi.org/10.1016/j.patrec.2016.01.029
- PyPruning.MIQPPruningClassifier.combined_error(i, j, ensemble_proba, target)
Computes the pairwise errors of the two classifiers i and j.
- Reference:
Zhang, Y., Burer, S., & Street, W. N. (2006). Ensemble Pruning Via Semi-definite Programming. Journal of Machine Learning Research, 7, 1315–1338. https://doi.org/10.1016/j.jasms.2006.06.007