Proximal Gradient Descent
- class PyPruning.ProxPruningClassifier.ProxPruningClassifier(loss='cross-entropy', step_size=0.1, ensemble_regularizer='hard-L0', l_ensemble_reg=0, regularizer=<function node_regularizer>, l_reg=0, normalize_weights=True, batch_size=256, epochs=1, verbose=False, optimizer='adam')
Bases:
PyPruning.PruningClassifier.PruningClassifier
(Heterogeneous) Pruning via Proximal Gradient Descent
This pruning method directly minimizes a constrained loss function \(L\) including a regularizer \(R_1\) via (stochastic) proximal gradient descent. There are two sets of constraints available. When soft constraints are used, then the following function is minimized
\[\arg\min_w L \left(\sum_{i=1}^M w_i h_i(x), y\right) + \lambda_1 \sum_{i=1}^K w_i R_1(h_i) + \lambda_2 R_2(w)\]When hard constraints are used, then the following objective is minimized
\[\arg\min_w L \left(\sum_{i=1}^M w_i h_i(x), y\right) + \lambda_1 \sum_{i=1}^K w_i R_1(h_i) \text{ s.t. } R_2(w) \le \lambda_2\]The regularizer \(R_1\) is used to select smaller trees, whereas the regularizer \(R_2\) is used to select fewer trees from the ensemble.
- lossstr, default is
"mse"
The loss function for training. Should be one of
{"mse", "cross-entropy", "hinge2"}
."mse"
: \(L(f(x),y) = \sum_{i=1}^C (f(x)_i - y_i)^2\)"cross-entropy"
: \(L(f(x),y) = \sum_{i=1}^C y_i \log(s(f(x))_i)\), where \(s\) is the softmax function."hinge2"
: \(L(f(x),y) = \sum_{i=1}^C \max(0, 1 - y_i \cdot f(x)_i )^2\)
- step_sizefloat, default is 0.1
The step_size used for stochastic gradient descent for opt
- normalize_weightsboolean, default is True
True if nonzero weights should be projected onto the probability simplex, that is they should sum to 1.
- ensemble_regularizerstr or None, default is
"hard-L0"
The ensemble_regularizer \(R_2\). This regularizer is used to select fewer members from the ensembles. It should be one of
{None, "L0", "L1", "hard-L0"}
None
: No constraints are applied during ensemble selection."L0"
: Apply \(R_2(w) = || w ||_0\) regularization (implemented vianumpy.linalg.norm
). The regularization strength \(\lambda_2\) scales the regularizer in this case."L1"
: Apply \(R_2(w) = || w ||_1\) regularization (implemented vianumpy.linalg.norm
). The regularization strength \(\lambda_2\) scales the regularizer in this case."hard-L0"
: Apply \(R_2(w) = || w ||_0 \le \lambda_2\) regularization. This is the “hard” version of theL0
regularization. The regularization strength \(\lambda_2\) is used a an upper bound in this case.
- l_ensemble_regfloat, default is 0
The
ensemble_regularizer
regularization strength \(\lambda_2\). If"L0"
or"L1"
is selected, thenl_ensemble_reg
is the regularization strength which scales the regularizer. If"hard-L0"
is selected, thenl_ensemble_reg
is the maximum number of members in pruned ensemble.- tree_regularizerfunction or
None
, default isnode_regularizer
The tree_regularizer \(R_1\). This regularizer is used to select smaller trees. This should be None or a function which returns the regularizer given a single tree.
- l_tree_regfloat, default is 0
The
tree_regularizer
regularization strength \(\lambda_1\). Thetree_regularizer
is scaled by this value.- batch_size: int, default is 256
The batch sized used for PSGD. Use 0 for the entire dataset per batch which leads to Prox Gradient Descent.
- epochsint, default is 1
The number of epochs PSGD is run.
- verboseboolean, default is False
If true, shows a progress bar via tqdm and some statistics
- out_path: str or None, default is None
If not None, then statistics are stored in a file called
$out_path/epoch_$i.npy
for epoch $i.
- num_estimators()
Returns the number of nonzero weights
- prune_(proba, target, data)
Prunes the ensemble using the ensemble predictions proba and the pruning data targets / data. If the pruning method requires access to the original ensemble members you can access these via self.estimators_. Note that self.estimators_ is already a deep-copy of the estimators so you are also free to change the estimators in this list if you want to.
- Parameters
proba (numpy matrix) – A (N,M,C) matrix which contains the individual predictions of each ensemble member on the pruning data. Each ensemble prediction is generated via predict_proba. N is size of the pruning data, M the size of the base ensemble and C is the number of classes
target (numpy array of ints) – A numpy array or list of N integers where each integer represents the class for each example. Classes should start with 0, so that for C classes the integer 0,1,…,C-1 are used
data (numpy matrix, optional) – The data points in a (N, M) matrix on which the proba has been computed, where N is the pruning set size and M is the number of classifier in the original ensemble. This can be used by a pruning method if required, but most methods do not require the actual data points but only the individual predictions.
- Returns
A tuple of indices and weights (idx, weights) with the following properties
idx (numpy array / list of ints) – A list of integers which classifier should be selected from self.estimators_. Any changes made to self.estimators_ are also reflected here, so make sure that the order of classifier in proba and self.estimators_ remains the same (or you return idx accordingly)
weights (numpy array / list of floats) – The individual weights for each selected classifier. The size of this array should match the size of idx (and not the size of the original base ensemble).
- lossstr, default is
- PyPruning.ProxPruningClassifier.avg_path_len_regularizer(est)
Extract the number of nodes in the given tree
- Parameters
X (numpy matrix) – A (N, d) matrix with the datapoints used for pruning where N is the number of data points and d is the dimensionality
Y (numpy array / list of ints) – A numpy array or list of N integers where each integer represents the class for each example. Classes should start with 0, so that for C classes the integer 0,1,…,C-1 are used
est (object) – Estimator for which the regularizer is computed.
Notes
Thanks to Mojtaba Masoudinejad (mojtaba.masoudinejad@tu-dortmund.de) for the implementation
- Returns
u – The computed regularizer
- Return type
float / int scalar
- PyPruning.ProxPruningClassifier.create_mini_batches(inputs, targets, batch_size, shuffle=False)
Create an mini-batch like iterator for the given inputs / target / data. Shamelessly copied from https://stackoverflow.com/questions/38157972/how-to-implement-mini-batch-gradient-descent-in-python
- Parameters
inputs (array-like vector or matrix) – The inputs to be iterated in mini batches
targets (array-like vector or matrix) – The targets to be iterated in mini batches
batch_size (int) – The mini batch size
shuffle (bool, default False) – If True shuffle the batches
- PyPruning.ProxPruningClassifier.loss_and_deriv(loss_type, output, target)
- PyPruning.ProxPruningClassifier.node_regularizer(est)
Extract the number of nodes in the given tree
- Parameters
X (numpy matrix) – A (N, d) matrix with the datapoints used for pruning where N is the number of data points and d is the dimensionality
Y (numpy array / list of ints) – A numpy array or list of N integers where each integer represents the class for each example. Classes should start with 0, so that for C classes the integer 0,1,…,C-1 are used
est (object) – Estimator for which the regularizer is computed.
- Returns
u – The computed regularizer
- Return type
float / int scalar
- PyPruning.ProxPruningClassifier.prox(w, prox_type, normalize, l_reg, step_size)
- PyPruning.ProxPruningClassifier.to_prob_simplex(x)
Projects the given vector to the probability simplex so that \(\sum_{i=1}^k x_i = 1, x_i \in [0,1]\).
- Reference
Weiran Wang and Miguel A. Carreira-Perpinan (2013) Projection onto the probability simplex: An efficient algorithm with a simple proof, and an application. https://eng.ucmerced.edu/people/wwang5/papers/SimplexProj.pdf
- Parameters
x (array-like vector with k entries) – The vector to be projected.
- Returns
u – The projected vector.
- Return type
array-like vector with k entries