Extending fastinference

One of the main design goals of Fastinference is to allow to easily add new types of optimizations and implementations while benefiting from existing optimizations and implementations out of the box. The central object in Fastinference is the model.

Adding a new type of implementation

Adding a new implementation for a given object is easy. Simply provide a implement.py file which contains a function to_implementation which receives

  • model: The model to be implemented. This is a deepcopy of the original model’s object so you can perform changes on this object if required.

  • out_path: The folder in which the source code for this model should be stored

  • out_name: The filename under which the models implementation should be stored

  • weight: The weight of this model in case it is part of an ensemble. The prediction should be scaled by this weight.

def to_implementation(model, out_path, out_name, weight = 1.0, **kwargs):
    # Generate the new implementation here

Fastinference will search for existing implementations under implementations/my/new/implementation/imlement.py which can then be loaded via --implementation my.new.implementation. Per convention we currently store implementations under implementations/{model}/{language}/{implementation}. You can pass any additional argument using kwargs and fastinference will try to lazily pass any command-line arguments to you function. Don’t forget to document your implementation. Just adapt docs/implementations.rst to include your new implementation and the docstring of your to_implementation will be include in the docs.

A note for ensembles: For the cpp implementations we currently assume the following signature. Here, the predictions should be added into the pred array and not copied, because the implementation of the ensemble will call each base-learners implementation on the same array.

void predict_{{model.name}}({{ feature_type }} const * const x, {{ label_type }} * pred) {
    // the actual code
}

Important: Currently all implementations utilize the template engine jinja (https://jinja.palletsprojects.com/en/3.0.x/), but there is no requirement to use jinja for new types of implementations. We originally intended to provide all implementations via jinja (e.g. also for other languages), but although jinja is very powerful it would sometimes be very difficult to provide certain types of implementations. Hence, we decided to simply use python code to generate the necessary implementations without any formal depenence on jinja. Nevertheless, we recommend to use jinja whenever possible. For any C-type language (e.g. C, Java etc.) we recommend to simply copy the entire implementation folder of each model and then to adapt the jinja templates wherever necessary.

Adding a new type of optimization

Adding a new optimization for a given object is easy. Simply provid a function optimize which receives the model to be optimized and returns the optimized model:

def optimize(model, **kwargs):
    # Perform some optimizations on model the new implementation here

    return model

Fastinference will search for existing optimizations under optimizations/my/new/optimization.py which can then be loaded via --optimize my.new.optimization. Per convention we currently store optimizations under {optimizers}/{model}/. You can pass any additional argument using kwargs and fastinference will try to lazily pass any command-line arguments to you function. Don’t forget to document your implementation. Just adapt docs/{model}.rst to include your new optimization and the docstring of your optimize will be include in the docs.

Adding a new type of model

Adding a new model to fastinference is slightly more work. First, you need to implement fastinference.models.Model. To do so, you will have to implement the predict_proba method which executes the given model on a batch of data and the to_dict method which return a dictionary representation of the model. Last, you also might need to supply a new model category such as {linear, tree, ensemble, discriminant, neuralnet}:

class MyModel(Model):
    def __init__(self, classes, n_features, category, accuracy = None, name = "Model"):
        super().__init__(classes, n_features, "A-new-category", accuracy, name)
        pass

    def predict_proba(self,X):
        pass

    def to_dict(self):
        model_dict = super().to_dict()
        # Add some stuff to model_dict
        return model_dict

Once the model is implemented you need to provide methods for loading and storing. The main entry points for loading and storing in fastinference

  • Loader.model_from_file for loading a new model from a file

  • Loader.model_to_json for storing a new model into a JSON file

In order to load the model you will have to adapt Loader.model_from_file. If your model does not really fit into a JSON format or comes with its own format (e.g. as for neural networks and the ONNX format) then you can ignore Loader.model_to_json. However, we try to keep these loading / storing functions as consistent as possible so try to provide both if possible.

Testing your implementation / optimization

Training a model, generating the code and finally compiling it can be a cumbersome endeavor if you want to debug / test your implementation. We offer some scripts which help during development

  • environment.yml: A anaconda environment file which we use during development.

  • tests/generate_data.py: A script to generate some random test and training data.

  • tests/train_{linear,discriminant,tree,mlp,cnn}.py: A script to train the respective classifier or an ensemble of those.

  • tests/convert_data.py: A script to convert the test data into a static header file for the c++ implementations.

  • tests/main.cpp: The main.cpp file when testing c++ implementations.

  • tests/CMakeLists.txt: The CMakeLists when testing c++ implementations.

A complete example of the entire workflow can be found in run_tests.sh and we try to maintain a CI/CD pipeline under tests.yml. Please check this file for the latest test configurations.