Random

class PyPruning.RandomPruningClassifier.RandomPruningClassifier(n_estimators=5, seed=None)

Bases: PyPruning.PruningClassifier.PruningClassifier

Random pruning. This pruning method implements a random pruning which randomly selects n_estimators from the original ensemble and assigns equal weights to each of the classifiers.

n_estimators

The number of estimators which should be selected.

Type

int, default is 5

seed

The random seed for the random selection

Type

int, optional, default is None

prune_(proba, target, data=None)

Prunes the ensemble using the ensemble predictions proba and the pruning data targets / data. If the pruning method requires access to the original ensemble members you can access these via self.estimators_. Note that self.estimators_ is already a deep-copy of the estimators so you are also free to change the estimators in this list if you want to.

Parameters
  • proba (numpy matrix) – A (N,M,C) matrix which contains the individual predictions of each ensemble member on the pruning data. Each ensemble prediction is generated via predict_proba. N is size of the pruning data, M the size of the base ensemble and C is the number of classes

  • target (numpy array of ints) – A numpy array or list of N integers where each integer represents the class for each example. Classes should start with 0, so that for C classes the integer 0,1,…,C-1 are used

  • data (numpy matrix, optional) – The data points in a (N, M) matrix on which the proba has been computed, where N is the pruning set size and M is the number of classifier in the original ensemble. This can be used by a pruning method if required, but most methods do not require the actual data points but only the individual predictions.

Returns

  • A tuple of indices and weights (idx, weights) with the following properties

  • idx (numpy array / list of ints) – A list of integers which classifier should be selected from self.estimators_. Any changes made to self.estimators_ are also reflected here, so make sure that the order of classifier in proba and self.estimators_ remains the same (or you return idx accordingly)

  • weights (numpy array / list of floats) – The individual weights for each selected classifier. The size of this array should match the size of idx (and not the size of the original base ensemble).