Ranking

class PyPruning.RankPruningClassifier.RankPruningClassifier(n_estimators=5, metric=<function individual_error>, n_jobs=8, metric_options=None)

Bases: PyPruning.PruningClassifier.PruningClassifier

Rank pruning.

Ranking methods assign a rank to each classifier in the ensemble and then select the best n_estimators according to this ranking. To rate each classifier a metric must be given. A metric is a function with receives three parameters:

  • i (int): The classifier which should be rated

  • ensemble_proba (A (M, N, C) matrix ): All N predictions of all M classifier in the entire ensemble for all C classes

  • target (list / array): A list / array of class targets.

A simple example for this function would be the individual error of each method:

def individual_error(i, ensemble_proba, target):
    iproba = ensemble_proba[i,:,:]
    return (iproba.argmax(axis=1) != target).mean()

Important The classifiers are sorted in ascending order and the first n_estimators are selected. Differently put, the metric is always minimized.

n_estimators

The number of estimators which should be selected.

Type

int, default is 5

metric

A function that assigns a score to each classifier which is then used for sorting

Type

function, default is individual_error

n_jobs

The number of threads used for computing the individual metrics for each classifier.

Type

int, default is 8

metric_options

Any additional metric_options are directly supplied to the metric function via the ** operator

Type

dict or None, default is None

prune_(proba, target, data=None)

Prunes the ensemble using the ensemble predictions proba and the pruning data targets / data. If the pruning method requires access to the original ensemble members you can access these via self.estimators_. Note that self.estimators_ is already a deep-copy of the estimators so you are also free to change the estimators in this list if you want to.

Parameters
  • proba (numpy matrix) – A (N,M,C) matrix which contains the individual predictions of each ensemble member on the pruning data. Each ensemble prediction is generated via predict_proba. N is size of the pruning data, M the size of the base ensemble and C is the number of classes

  • target (numpy array of ints) – A numpy array or list of N integers where each integer represents the class for each example. Classes should start with 0, so that for C classes the integer 0,1,…,C-1 are used

  • data (numpy matrix, optional) – The data points in a (N, M) matrix on which the proba has been computed, where N is the pruning set size and M is the number of classifier in the original ensemble. This can be used by a pruning method if required, but most methods do not require the actual data points but only the individual predictions.

Returns

  • A tuple of indices and weights (idx, weights) with the following properties

  • idx (numpy array / list of ints) – A list of integers which classifier should be selected from self.estimators_. Any changes made to self.estimators_ are also reflected here, so make sure that the order of classifier in proba and self.estimators_ remains the same (or you return idx accordingly)

  • weights (numpy array / list of floats) – The individual weights for each selected classifier. The size of this array should match the size of idx (and not the size of the original base ensemble).

PyPruning.RankPruningClassifier.error_ambiguity(i, ensemble_proba, target)

Compute the error for the individual classifier according to the ambiguity decomposition. I am fairly sure that this implementation is correct, however, the paper is not super clear on what they do from an algorithmic point of view. From what I can tell is, that the authors compute the ambiguity scores for each classifier only once and then “greedily” pick the best K models.

Notes

The Jiang etal paper only considers binary classification problems and specifically focuses on the logistic loss function. Luckily, Hastie et al. proposed a multi-class boosting algorithm which uses a multi class variation of the (binary) logistic loss. Both loss functions are equal for 2 classes and thus we implement the multi-class version here. For more details see the references.

Reference:
  • Jiang, Z., Liu, H., Fu, B., & Wu, Z. (2017). Generalized ambiguity decompositions for classification with applications in active learning and unsupervised ensemble pruning. 31st AAAI Conference on Artificial Intelligence, AAAI 2017, 2073–2079.

  • Hastie, T., Rosset, S., Zhu, J., & Zou, H. (2009). Multi-class AdaBoost. Statistics and Its Interface, 2(3), 349–360. https://doi.org/10.4310/sii.2009.v2.n3.a8

PyPruning.RankPruningClassifier.individual_contribution(i, ensemble_proba, target)

Compute the individual contributions of each classifier wrt. the entire ensemble. Return the negative contribution due to the minimization.

Reference:

Lu, Z., Wu, X., Zhu, X., & Bongard, J. (2010). Ensemble pruning via individual contribution ordering. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 871–880. https://doi.org/10.1145/1835804.1835914

PyPruning.RankPruningClassifier.individual_error(i, ensemble_proba, target)

Compute the error for the individual classifier. If I read it correctly, then the following paper proposed this method. Although the paper is not super clear on this.

Reference:

Jiang, Z., Liu, H., Fu, B., & Wu, Z. (2017). Generalized ambiguity decompositions for classification with applications in active learning and unsupervised ensemble pruning. 31st AAAI Conference on Artificial Intelligence, AAAI 2017, 2073–2079.

PyPruning.RankPruningClassifier.individual_kappa_statistic(i, ensemble_proba, target)

Compute the Cohen-Kappa statistic for the individual classifier with respect to the entire ensemble.

Reference:

Margineantu, D., & Dietterich, T. G. (1997). Pruning Adaptive Boosting. Proceedings of the Fourteenth International Conference on Machine Learning, 211–218. https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.38.7017&rep=rep1&type=pdf

PyPruning.RankPruningClassifier.individual_margin_diversity(i, ensemble_proba, target, alpha=0.2)

Computes the individual diversity of the classifier wrt. to the ensemble and its contribution to the margin. alpha controls the trade-off between both values. The original paper uses \(\alpha = 0.2\) in all experiments and reports that it worked well. Thus, it is also the default value here. You can supply a different alpha via the metric_options parameter of RankPruningClassifier.

Reference:

Guo, H., Liu, H., Li, R., Wu, C., Guo, Y., & Xu, M. (2018). Margin & diversity based ordering ensemble pruning. Neurocomputing, 275, 237–246. https://doi.org/10.1016/j.neucom.2017.06.052

PyPruning.RankPruningClassifier.individual_neg_auc(i, ensemble_proba, target)

Compute the roc auc score for the individual classifier, but return its negative value for minimization.

PyPruning.RankPruningClassifier.reference_vector(i, ensemble_proba, target)

Compare how close the individual predictions is to the entire ensemble’s prediction by using the cosine similarity.

Notes

The original paper describes a slightly different distance metric compared to what is implemented here. The paper uses a projection to a reference vector, but – unfortunately – does not explain the specific implementation in detail. However, the authors also note two things:

    1. They use all classifier with an angle <= pi/2 which can lead to more than n_estimator classifier. This implementation selects at most n_estimators and thus we need to present an ordering based on the angles and pick the first n_estimator.

    1. “The classifiers are ordered by increasing values of the angle between the signature vectors of the individual classifiers and the reference vector”.

The variables ref and ipred in this implementation follow the exact definitions as presented in the paper (eq. 3) and cosine is the most direct implementation of “the angle between signature and reference vector”.

Reference:

Hernández-Lobato, D., Martínez-Muñoz, G., & Suárez, A. (2006). Pruning in Ordered Bagging Ensembles. International Conference on Machine Learning, 1266–1273. https://doi.org/10.1109/ijcnn.2006.246837