Fastinference is a machine learning model optimizer and model compiler that generates the optimal implementation for your model and hardware architecture:

In Fastinference the user comes first. We believe that the user know best what implementation and what type of optimizations should be performed. Hence, we generate readable code so that the user can adapt and change the implementation if necessary.

In Fastinference optimizations and implementations can be freely combined. Fastinference distinguishes between optimizations for specific models which are independent from the implementation and specific types of implementations. Consider for example a simple decision tree, then the pruning of the model does not affect its implementation and vice-versa.

Fastinference can be easily extended. You can easily add your own implementation while benefiting from all optimizations performed on the model and vice-versa.

How to install

You can install this package via pip from git

pip install git+

If you have trouble with dependencies you can try setting up a conda environment which I use for development:

git clone
cd fastinference
conda env create -f environment.yml
conda activate fi

Please note that this environment also contains some larger packages such as PyTorch so the installation may take some time.

How to use fastinference

Using fastinference from the command linear

If you have stored your model on disk (e.g. as an json file) then you can generate the code directly from the CLI via:

python3 fastinference/ --model /my/nice/model.json --feature_type float --out_path /my/nice/model --out_name "model" --implementation my.newest.implementation --optimize my.newest.optimization

This call will load the model stored in /my/nice/model.json, performs the optimizations implemented in my.newest.optimization and then finally generates the implementation according to my.newest.implementation where the data type of features is float. Any additional arguments passed to will be passed to the my.newest.optimization and my.newest.implementation respectively so you can just pass anything you require. Note that for ensembles you can additionally pass baseimplementation and baseoptimize to specify optimizations on the base learners as well as their respective implementations.

For Linear, Discriminant, Tree, Ensemble models we currently support .json files which have previously been written via Loader.model_to_json. For Neural Networks we use onnx files which e.g. have been written via torch.onnx.export or tf2onnx. Reading onnx files can be tricky sometimes so please check out Neural Network for caveats.

Using fastinference in your python program

Simply import fastinference.Loader, load your model and you are ready to go:

import fastinference.Loader

loaded_model = fastinference.Loader.model_from_file("/my/nice/model.json")
loaded_model.optimize("my.newest.optimization", None)
loaded_model.implement("/my/nice/model", "model", "my.newest.implementation")

Again for ensembles you can pass additional base_optimizers and base_args arguments to the call of optimize for the optimization of base learners in the ensemble. For scikit-learn models you can also Loader.model_from_sklearn to load the model. For Deep Learning approaches you will always have to store the model as an ONNX file first.

A complete example

A complete example which trains a Random Forest on artificial data, performs some optimizations on the trees and finally generates some c++ code would look like the following:

# Define some constants

# Generate some artificial data with 5 classes, 20 features and 10000 data points.
python3 tests/data/ --out $OUTPATH --nclasses 5 --nfeatures 20 --difficulty 0.5 --nexamples 10000

# Train a RF with 25 trees on the generated data
python3 tests/train_$ --training $OUTPATH/training.csv --testing $OUTPATH/testing.csv --out $OUTPATH --name $MODELNAME  --nestimators 25

# Perform the actual optimization + code generation
python3 fastinference/ --model $OUTPATH/$MODELNAME.json --feature_type $FEATURE_TYPE --out_path $OUTPATH --out_name "model" --implementation cpp --baseimplementation cpp.ifelse --baseoptimize swap

# Prepare the C++ files for compilation
python3 ./tests/data/ --file $OUTPATH/testing.csv --out $OUTPATH/testing.h --dtype $FEATURE_TYPE --ltype "unsigned int"
cp ./tests/main.cpp $OUTPATH
cp ./tests/CMakeLists.txt $OUTPATH

# Compile the code

# Run the code

There is a CI/CD pipeline running which tests the current code and uses tests/ to orchestrate the various scripts. In doubt please have a look at these files.


The software is written and maintained by Sebastian Buschjäger as part of his work at the Chair for Artificial Intelligence ls8 at the TU Dortmund University and the Collaborative Research Center 876 sfb. If you have any question feel free to contact me under

Special thanks goes to Maik Schmidt and Andreas Buehner who provided parts of this implementation during their time at the TU Dortmund University.

Indices and tables