Neural Networks
Fastinference offers a limited support for Deep Learning and Neural Network architectures. The current focus is on feed-forward MLPs and ConvNets in the context of small, embedded systems and FPGAs, but we are always open to enhance our support for new Deep Learning architectures.
Important: ONNX is the open standard for machine learning interoperability and supported by all major Deep Learning frameworks. However, the ONNX format is still under development and a given deep architecture can often be represented with various computational graphs. Hence, this standard is sometimes ambiguous. This implementation has been tested with PyTorch and visualized with Netron. For exporting a Neural Net we usually use
dummy_x = torch.randn(1, x_train.shape[1], requires_grad=False) torch.onnx.export(model, dummy_x, os.path.join(out_path,name), training=torch.onnx.TrainingMode.PRESERVE, export_params=True,opset_version=11, do_constant_folding=True, input_names = ['input'], output_names = ['output'], dynamic_axes={'input' : {0 : 'batch_size'},'output' : {0 : 'batch_size'}})
Some notes on Binarized Neural Networks
Binarized Neural Networks (BNNs) are Neural Networks with weights constraint to {-1,+1} so that the forward pass of the entire network can be executed via boolean operations (usually XNOR + popcount). A typical structure of these networks are as follows:
Input -> Linear / Conv -> BatchNorm -> Step -> … -> Linear / Conv -> BatchNorm -> Step -> Output
where the Linear / Conv layers only have “binary” weights and biases {-1,+1} and the step function is Heaviside function. BNNs are usually not supported by the major frameworks out of the box, but require some additional libraries as well as some tweaks in the ONNX format. For example, larq offers binarization for keras / tensorflow and Brevitas enables binarization for PyTorch. Alternatively, we can directly implement binarization as shown in the example below. Unfortunately, ONNX does not support the custom operators from these libraries so that we have to sanitize these before exporting. In fastinference we simply replace each binary layer, e.g. BinaryLinear
, with its regular counterpart torch.nn.Linear
. Moreover, PyTorch cannot export the Heaviside function yet into an ONNX file. Hence we mimic this function with a series of “Constant -> Greater -> Constant -> Constant -> Where” layers which is then parsed and merged back into a Step layer by fastinference. For a complete example check out train_mlp.py or train_cnn.py.
class BinarizeF(Function): @staticmethod def forward(ctx, input): output = input.new(input.size()) output[input > 0] = 1 output[input <= 0] = -1 return output @staticmethod def backward(ctx, grad_output): #return grad_output, None grad_input = grad_output.clone() return grad_input#, None binarize = BinarizeF.apply class BinaryLinear(nn.Linear): def __init__(self, *args, **kwargs): super(BinaryLinear, self).__init__(*args, **kwargs) def forward(self, input): if self.bias is None: binary_weight = binarize(self.weight) return F.linear(input, binary_weight) else: binary_weight = binarize(self.weight) binary_bias = binarize(self.bias) return F.linear(input, binary_weight, binary_bias) class BinaryTanh(nn.Module): def __init__(self, *args, **kwargs): super(BinaryTanh, self).__init__() self.hardtanh = nn.Hardtanh(*args, **kwargs) def forward(self, input): output = self.hardtanh(input) output = binarize(output) return output class SimpleMLP(nn.Module): def __init__(self, input_dim, n_classes): super().__init__() self.layer_1 = BinaryLinear(input_dim, 128) self.bn_1 = nn.BatchNorm1d(128) self.activation_1 = BinaryTanh() self.layer_2 = BinaryLinear(128, 256) self.bn_2 = nn.BatchNorm1d(256) self.activation_2 = BinaryTanh() self.layer_3 = BinaryLinear(256, n_classes) def forward(self, x): x = self.layer_1(x) x = self.bn_1(x) x = self.activation_1(x) x = self.layer_2(x) x = self.bn_2(x) x = self.activation_2(x) x = self.layer_3(x) x = torch.log_softmax(x, dim=1) return x def sanatize_onnx(model): # Usually I would use https://pytorch.org/docs/stable/generated/torch.heaviside.html for exporting here, but this is not yet supported in ONNX files. class Sign(nn.Module): def forward(self, input): return torch.where(input > 0, torch.tensor([1.0]), torch.tensor([-1.0])) for name, m in reversed(model._modules.items()): print("Checking {}".format(name)) if isinstance(m, BinaryLinear): print("Replacing {}".format(name)) # layer_old = m layer_new = nn.Linear(m.in_features, m.out_features, hasattr(m, 'bias')) if (hasattr(m, 'bias')): layer_new.bias.data = binarize(m.bias.data) layer_new.weight.data = binarize(m.weight.data) model._modules[name] = layer_new if isinstance(m, BinaryTanh): model._modules[name] = Sign() return model model = SimpleMLP(input_dim, n_classes) # Train the model model = sanatize_onnx(model) torch.onnx.export(model,dummy_x,os.path.join(out_path,name), export_params=True,opset_version=11, do_constant_folding=True, input_names = ['input'], output_names = ['output'], dynamic_axes={'input' : {0 : 'batch_size'},'output' : {0 : 'batch_size'}})
Available optimizations
- fastinference.optimizers.neuralnet.merge_nodes.optimize(model, **kwargs)
Merges subsequent BatchNorm and Step layers into a new Step layer with adapted thresholds in a single pass. Currently there is no recursive merging applied.
TODO: Perform merging recursively.
- fastinference.optimizers.neuralnet.remove_nodes.optimize(model, **kwargs)
Removes LogSoftmax and positive scaling (Mul) layers from the network because they do not change the prediction.
The NeuralNet object
- class fastinference.models.nn.NeuralNet.NeuralNet(path_to_onnx, accuracy=None, name='model')
A (simplified) neural network model. This class currently supports feed-forward multi-layer perceptrons as well as feed-forward convnets. In detail the following operations are supported
Linear Layer
Convolutional Layer
Sigmoid Activation
ReLU Activation
LeakyRelu Activation
MaxPool
AveragePool
LogSoftmax
LogSoftmax
Multiplication with a constant (Mul)
Reshape
BatchNormalization
All layers are stored in
self.layer
which is already order for execution. Additionally, the original onnx_model is stored inself.onnx_model
.This class loads ONNX files to build the internal computation graph. This can sometimes become a little tricky since the ONNX exporter work differently for each framework / version. In PyToch we usually use
dummy_x = torch.randn(1, x_train.shape[1], requires_grad=False) torch.onnx.export(model, dummy_x, os.path.join(out_path,name), training=torch.onnx.TrainingMode.PRESERVE, export_params=True,opset_version=11, do_constant_folding=True, input_names = ['input'], output_names = ['output'], dynamic_axes={'input' : {0 : 'batch_size'},'output' : {0 : 'batch_size'}})
Important: This class automatically merges “Constant -> Greater -> Constant -> Constant -> Where” operations into a single step layer. This is specifically designed to parse Binarized Neural Networks, but might be wrong for some types of networks.
- __init__(path_to_onnx, accuracy=None, name='model')
Constructor of NeuralNet.
- Parameters
onnx_neural_net (str) – Path to the onnx file.
accuracy (float, optional) – The accuracy of this tree on some test data. Can be used to verify the correctness of the implementation. Defaults to None.
name (str, optional) – The name of this model. Defaults to “Model”.
- predict_proba(X)
Applies this NeuralNet to the given data and provides the predicted probabilities for each example in X. This function internally calls
onnxruntime.InferenceSession
for inference..- Parameters
X (numpy.array) – A (N,d) matrix where N is the number of data points and d is the feature dimension. If X has only one dimension then a single example is assumed and X is reshaped via
X = X.reshape(1,X.shape[0])
- Returns
A (N, c) prediction matrix where N is the number of data points and c is the number of classes
- Return type
numpy.array
- fastinference.models.nn.NeuralNet.layer_from_node(graph, node, input_shape)
Constructs the appropriate layer from the given graph and node.
- Parameters
graph – The onnx graph.
node – The current node.
input_shape (tuple) – The input shape of the current node
- Raises
NotImplementedError – Throws an error if there is no implementation for the current node available.
- Returns
The newly constructed layer.
- Return type
Layer