Proximal Gradient Descent

class PyPruning.ProxPruningClassifier.ProxPruningClassifier(loss='cross-entropy', step_size=0.1, ensemble_regularizer='hard-L0', l_ensemble_reg=0, regularizer=<function node_regularizer>, l_reg=0, normalize_weights=True, batch_size=256, epochs=1, verbose=False, optimizer='adam')

Bases: PyPruning.PruningClassifier.PruningClassifier

(Heterogeneous) Pruning via Proximal Gradient Descent

This pruning method directly minimizes a constrained loss function $L$ including a regularizer $R_1$ via (stochastic) proximal gradient descent. There are two sets of constraints available. When soft constraints are used, then the following function is minimized

\[\arg\min_w L \left(\sum_{i=1}^M w_i h_i(x), y\right) + \lambda_1 \sum_{i=1}^K w_i R_1(h_i) + \lambda_2 R_2(w)\]

When hard constraints are used, then the following objective is minimized

\[\arg\min_w L \left(\sum_{i=1}^M w_i h_i(x), y\right) + \lambda_1 \sum_{i=1}^K w_i R_1(h_i) \text{ s.t. } R_2(w) \le \lambda_2\]

The regularizer $R_1$ is used to select smaller trees, whereas the regularizer $R_2$ is used to select fewer trees from the ensemble.

lossstr, default is "mse"

The loss function for training. Should be one of {"mse", "cross-entropy", "hinge2"}.

"mse": $L(f(x),y) = \sum_{i=1}^C (f(x)_i - y_i)^2$
"cross-entropy": $L(f(x),y) = \sum_{i=1}^C y_i \log(s(f(x))_i)$, where $s$ is the softmax function.
"hinge2": $L(f(x),y) = \sum_{i=1}^C \max(0, 1 - y_i \cdot f(x)_i )^2$

step_sizefloat, default is 0.1

The step_size used for stochastic gradient descent for opt

normalize_weightsboolean, default is True

True if nonzero weights should be projected onto the probability simplex, that is they should sum to 1.

ensemble_regularizerstr or None, default is "hard-L0"

The ensemble_regularizer $R_2$. This regularizer is used to select fewer members from the ensembles. It should be one of {None, "L0", "L1", "hard-L0"}

None: No constraints are applied during ensemble selection.
"L0": Apply $R_2(w) = || w ||_0$ regularization (implemented via numpy.linalg.norm ). The regularization strength $\lambda_2$ scales the regularizer in this case.
"L1": Apply $R_2(w) = || w ||_1$ regularization (implemented via numpy.linalg.norm ). The regularization strength $\lambda_2$ scales the regularizer in this case.
"hard-L0": Apply $R_2(w) = || w ||_0 \le \lambda_2$ regularization. This is the “hard” version of the L0 regularization. The regularization strength $\lambda_2$ is used a an upper bound in this case.

l_ensemble_regfloat, default is 0

The ensemble_regularizer regularization strength $\lambda_2$. If "L0" or "L1" is selected, then l_ensemble_reg is the regularization strength which scales the regularizer. If "hard-L0" is selected, then l_ensemble_reg is the maximum number of members in pruned ensemble.

tree_regularizerfunction or None, default is node_regularizer

The tree_regularizer $R_1$. This regularizer is used to select smaller trees. This should be None or a function which returns the regularizer given a single tree.

l_tree_regfloat, default is 0

The tree_regularizer regularization strength $\lambda_1$. The tree_regularizer is scaled by this value.

batch_size: int, default is 256

The batch sized used for PSGD. Use 0 for the entire dataset per batch which leads to Prox Gradient Descent.

epochsint, default is 1

The number of epochs PSGD is run.

verboseboolean, default is False

If true, shows a progress bar via tqdm and some statistics

out_path: str or None, default is None

If not None, then statistics are stored in a file called $out_path/epoch_$i.npy for epoch $i.

num_estimators(): Returns the number of nonzero weights

prune_(proba, target, data)

Prunes the ensemble using the ensemble predictions proba and the pruning data targets / data. If the pruning method requires access to the original ensemble members you can access these via self.estimators_. Note that self.estimators_ is already a deep-copy of the estimators so you are also free to change the estimators in this list if you want to.

Parameters

proba (numpy matrix) – A (N,M,C) matrix which contains the individual predictions of each ensemble member on the pruning data. Each ensemble prediction is generated via predict_proba. N is size of the pruning data, M the size of the base ensemble and C is the number of classes
target (numpy array of ints) – A numpy array or list of N integers where each integer represents the class for each example. Classes should start with 0, so that for C classes the integer 0,1,…,C-1 are used
data (numpy matrix, optional) – The data points in a (N, M) matrix on which the proba has been computed, where N is the pruning set size and M is the number of classifier in the original ensemble. This can be used by a pruning method if required, but most methods do not require the actual data points but only the individual predictions.

Returns

A tuple of indices and weights (idx, weights) with the following properties
idx (numpy array / list of ints) – A list of integers which classifier should be selected from self.estimators_. Any changes made to self.estimators_ are also reflected here, so make sure that the order of classifier in proba and self.estimators_ remains the same (or you return idx accordingly)
weights (numpy array / list of floats) – The individual weights for each selected classifier. The size of this array should match the size of idx (and not the size of the original base ensemble).

PyPruning.ProxPruningClassifier.avg_path_len_regularizer(est)

Extract the number of nodes in the given tree

Parameters

X (numpy matrix) – A (N, d) matrix with the datapoints used for pruning where N is the number of data points and d is the dimensionality
Y (numpy array / list of ints) – A numpy array or list of N integers where each integer represents the class for each example. Classes should start with 0, so that for C classes the integer 0,1,…,C-1 are used
est (object) – Estimator for which the regularizer is computed.

Notes

Thanks to Mojtaba Masoudinejad (mojtaba.masoudinejad@tu-dortmund.de) for the implementation

Returns: u – The computed regularizer
Return type: float / int scalar

PyPruning.ProxPruningClassifier.create_mini_batches(inputs, targets, batch_size, shuffle=False)

Create an mini-batch like iterator for the given inputs / target / data. Shamelessly copied from https://stackoverflow.com/questions/38157972/how-to-implement-mini-batch-gradient-descent-in-python

Parameters

inputs (array-like vector or matrix) – The inputs to be iterated in mini batches
targets (array-like vector or matrix) – The targets to be iterated in mini batches
batch_size (int) – The mini batch size
shuffle (bool, default False) – If True shuffle the batches

PyPruning.ProxPruningClassifier.loss_and_deriv(loss_type, output, target)

PyPruning.ProxPruningClassifier.node_regularizer(est)

Extract the number of nodes in the given tree

Parameters

X (numpy matrix) – A (N, d) matrix with the datapoints used for pruning where N is the number of data points and d is the dimensionality
Y (numpy array / list of ints) – A numpy array or list of N integers where each integer represents the class for each example. Classes should start with 0, so that for C classes the integer 0,1,…,C-1 are used
est (object) – Estimator for which the regularizer is computed.

Returns

u – The computed regularizer

Return type

float / int scalar

PyPruning.ProxPruningClassifier.prox(w, prox_type, normalize, l_reg, step_size)

PyPruning.ProxPruningClassifier.to_prob_simplex(x)

Projects the given vector to the probability simplex so that $\sum_{i=1}^k x_i = 1, x_i \in [0,1]$.

Reference: Weiran Wang and Miguel A. Carreira-Perpinan (2013) Projection onto the probability simplex: An efficient algorithm with a simple proof, and an application. https://eng.ucmerced.edu/people/wwang5/papers/SimplexProj.pdf

Parameters: x (array-like vector with k entries) – The vector to be projected.
Returns: u – The projected vector.
Return type: array-like vector with k entries