Implementations

Linear implementations

Discriminant analysis implementations

Tree implementations

fastinference.implementations.tree.cpp.ifelse.implement.to_implementation(model, out_path, out_name, weight=1, namespace='FAST_INFERENCE', feature_type='double', label_type='double', kernel_budget=None, kernel_type=None, output_debug=False, target_compiler='g++', target_objdump='objdump', **kwargs)

Generates a (unrolled) C++ implementation of the given Tree model. Unrolled means that the tree is represented in an if-then-else structure without any arrays. You can use this implementation by simply passing "cpp.ifelse" to the implement, e.g.

loaded_model = fastinference.Loader.model_from_file("/my/nice/model.json")
loaded_model.implement("/some/nice/place/", "mymodel", "cpp.ifelse")
Parameters
  • model (Tree) – The Tree model to be implemented

  • out_path (str) – The folder in which the *.cpp and *.h files are stored.

  • out_name (str) – The filenames.

  • weight (float, optional) – The weight of this model inside an ensemble. The weight is ignored if it is 1.0, otherwise the prediction is scaled by the respective weight. Defaults to 1.0.

  • namespace (str, optional) – The namespace under which this model will be generated. Defaults to “FAST_INFERENCE”.

  • feature_type (str, optional) – The data types of the input features. Defaults to “double”.

  • label_type (str, optional) – The data types of the label. Defaults to “double”.

  • quantize_splits (str, optional) – Can be [“rounding”, “fixed”] or None.

  • kernel_budget (int, optional) – The budget in bytes which is allowed in a single kernel. Kernel optimizations are ignored if the budget None. Defaults to None.

  • kernel_type (str, optional) – The type of kernel optimization. Can be {path, node, None}. Kernel optimizations are ignored if the kernel type is None. Defaults to None.

  • output_debug (bool, optional) – If True outputs the given tree in the given folder in a json file called {model_name}_debug.json. Useful when debugging optimizations or loading the tree with another tool. Defaults to False.

  • target_compiler (str, optional) – The compiler used for compiling the dummy code to determine node sizes. If you want to use a cross-compiler (e.g. arm-linux-gnueabihf-gcc) you can set the path here accordingly. Defaults to “g++”.

  • target_objdump (str, optional) – The de-compiler used for de-compiling the dummy code to determine node sizes. If you want to use a cross-compiler (e.g. arm-linux-gnueabihf-gcc) you can set the path here accordingly. Defaults to “objdump”.

fastinference.implementations.tree.cpp.native.implement.to_implementation(model, out_path, out_name, weight=1, namespace='FAST_INFERENCE', feature_type='double', label_type='double', int_type='unsigned int', output_debug=False, infer_types=True, reorder_nodes=False, set_size=8, force_cacheline=False, **kwargs)

Generates a native C++ implementation of the given Tree model. Native means that the tree is represented in an array structure which is iterated via a while-loop. You can use this implementation by simply passing "cpp.native" to the implement, e.g.

loaded_model = fastinference.Loader.model_from_file("/my/nice/model.json")
loaded_model.implement("/some/nice/place/", "mymodel", "cpp.native")
Parameters
  • model (Tree) – The Tree model to be implemented

  • out_path (str) – The folder in which the *.cpp and *.h files are stored.

  • out_name (str) – The filenames.

  • weight (float, optional) – The weight of this model inside an ensemble. The weight is ignored if it is 1.0, otherwise the prediction is scaled by the respective weight. Defaults to 1.0.

  • namespace (str, optional) – The namespace under which this model will be generated. Defaults to “FAST_INFERENCE”.

  • feature_type (str, optional) – The data types of the input features. Defaults to “double”.

  • label_type (str, optional) – The data types of the label. Defaults to “double”.

  • output_debug (bool, optional) – If True outputs the given tree in the given folder in a json file called {model_name}_debug.json. Useful when debugging optimizations or loading the tree with another tool. Defaults to False.

  • infer_types (bool, optional) – If True then the smallest data type for index variables is inferred by the overall tree size. Otherwise “unsigned int” is used. Defaults to False.

  • reorder_nodes (bool, optional) – If True then the nodes in the tree are reorder so that cache set size is respected. You can set the size of the cache set via set_size parameter. Defaults to False.

  • set_size (int, optional) – The size of the cache set for if reorder_nodes is set to True. Defaults to 8.

  • force_cacheline (bool, optional) – If True then “padding” nodes are introduced to fill the entire cache line. Defaults to False.

Ensemble implementations

fastinference.implementations.ensemble.cpp.implement.to_implementation(model, out_path, out_name, weight=1.0, namespace='FAST_INFERENCE', feature_type='double', label_type='double', **kwargs)

Generates a C++ implementation of the given Ensemble model. This implementation simply calls the respective implementations of the base learners. You can use this implementation by simply passing "cpp" to implement. To choose the implementation of the base-learners pass an additional option to implement:

loaded_model = fastinference.Loader.model_from_file("/my/nice/model.json")
loaded_model.implement("/some/nice/place/", "mymodel", "cpp", "implementation.of.base.learners")
Parameters
  • model (Ensemble) – The Ensemble model to be implemented

  • out_path (str) – The folder in which the *.cpp and *.h files are stored.

  • out_name (str) – The filenames.

  • weight (float, optional) – The weight of this model inside an ensemble. The weight is ignored if it is 1.0, otherwise the prediction is scaled by the respective weight.. Defaults to 1.0.

  • namespace (str, optional) – The namespace under which this model will be generated. Defaults to “FAST_INFERENCE”.

  • feature_type (str, optional) – The data types of the input features. Defaults to “double”.

  • label_type (str, optional) – The data types of the label. Defaults to “double”.

Neural network implementations

fastinference.implementations.neuralnet.cpp.NHWC.implement.to_implementation(model, out_path, out_name, weight=1.0, align=0, namespace='FAST_INFERENCE', feature_type='double', label_type='double', internal_type='double', **kwargs)

Generates a C++ implementation of the given NeuralNet model. This implementation provides a NHWC layout for the convolution layers basically resulting in a structure like this:

for n = 1..N:
    for h = 1..H:
        for w = 1..W:
            for c = 1..C:
                //...

You can use this implementation by simply passing "cpp.NHWC" to implement:

loaded_model = fastinference.Loader.model_from_file("/my/nice/model.onnx")
loaded_model.implement("/some/nice/place/", "mymodel", "cpp.NHWC")
Parameters
  • model (NeuralNet) – The NeuralNet model to be implemented

  • out_path (str) – The folder in which the *.cpp and *.h files are stored.

  • out_name (str) – The filenames.

  • weight (float, optional) – The weight of this model inside an ensemble. The weight is ignored if it is 1.0, otherwise the prediction is scaled by the respective weight. Defaults to 1.0.

  • align (int, optional) – If align > 0 then allocated memory will be aligned using __attribute__((aligned({{align}}))) where {{align}} is replaced by the given align value. If align = 0 then no memory alignment is performed. Defaults to 0.

  • namespace (str, optional) – The namespace under which this model will be generated. Defaults to “FAST_INFERENCE”.

  • feature_type (str, optional) – The data types of the input features. Defaults to “double”.

  • label_type (str, optional) – The data types of the label. Defaults to “double”.

  • internal_type (str, optional) – The data type used for internal buffers and memory allocation. Defaults to “double”.

fastinference.implementations.neuralnet.cpp.binary.implement.to_implementation(model, out_path, out_name, weight=1.0, namespace='FAST_INFERENCE', align=0, feature_type='double', label_type='double', float_type='double', int_type='signed int', uint_type='unsigned int', infer_types=True, popcount=None, **kwargs)

Generates a C++ implementation of the given binarized NeuralNet model by using the XNOR and popcount operations whenever possible. When infer_types is true, this implementation tries to infer the smallest possible data type which still guarantees a correct execution. Otherwise the supplied data types are used. Note that the first layer is never binarized because we do not assume the input data to be in a packed integer format, but to be a regular array.

Important: This implementation performs basic optimization on the model. It

Important: This implementation generates gcc compliant code by using the __builtin_popcount or __builtin_popcountll (depending on the uint_type) intrinsic. This intrinsic can vary from compiler to compiler. If you want to compile with another compiler (e.g. MSVC or clang) then you can supply the corresponding popcount operation via the popcount argument. However, please keep in mind that you might have to adapt the included headers manually after the code generation and that the popcount operation should fit the corresponding uint_type.

You can use this implementation by simply passing "cpp.binary" to implement:

loaded_model = fastinference.Loader.model_from_file("/my/nice/model.onnx")
loaded_model.implement("/some/nice/place/", "mymodel", "cpp.binary")

References:

Parameters
  • model (NeuralNet) – The NeuralNet model to be implemented

  • out_path (str) – The folder in which the *.cpp and *.h files are stored.

  • out_name (str) – The filenames.

  • weight (float, optional) – The weight of this model inside an ensemble. The weight is ignored if it is 1.0, otherwise the prediction is scaled by the respective weight. Defaults to 1.0.

  • align (int, optional) – If align > 0 then allocated memory will be aligned using __attribute__((aligned({{align}}))) where {{align}} is replaced by the given align value. If align = 0 then no memory alignment is performed. Defaults to 0.

  • namespace (str, optional) – The namespace under which this model will be generated. Defaults to “FAST_INFERENCE”.

  • feature_type (str, optional) – The data types of the input features. Defaults to “double”.

  • label_type (str, optional) – The data types of the label. Defaults to “double”.

  • float_type (str, optional) – The floating point type used when required. Defaults to “double”.

  • int_type (str, optional) – The signed integer type used when required. Defaults to “signed int”.

  • uint_type (str, optional) – The unsigned integer type used when required. Defaults to “unsigned int”.

  • infer_types (bool, optional) – If True tries to infer the smallest possible data type which still guarantees a correct implementation. Defaults to True.

  • popcount (str, optional) – The popcount operation which should be used to compute popcount. If this is None, then __builtin_popcount or __builtin_popcountll is used depending on the binary_word_size required by the uint_type. Defaults to None.

fastinference.implementations.neuralnet.fpga.binary.implement.to_implementation(model, out_path, out_name, weight=1.0, namespace='FAST_INFERENCE', feature_type='double', label_type='double', float_type='double', int_type='signed int', uint_type='unsigned int', infer_types=True, lut_size=0, **kwargs)

Generates a C++ implementation of the given binarized NeuralNet model by using the XNOR and popcount operations whenever possible. This implementation is targeted towards High Level Synthesis and FPGAs and more specifcially to Xilinx HLS. The popcount operation is implemented via lookup tables (LUT). When infer_types is true, this implementation tries to infer the smallest possible data type which still guarantees a correct execution. Otherwise the supplied data types are used. Note that the first layer is never binarized because we do not assume the input data to be in a packed integer format, but to be a regular array.

Important: This implementation performs basic optimization on the model. It

Important: This implementation does not perform any optimizations with respect to the HLS tool. We highly recommend to perform a design space exploration manually after the code generation for a given neural net to have the best performance. Additionally, the input data must likely be adapted manually to the given FPGA.

You can use this implementation by simply passing "fpga.binary" to implement:

loaded_model = fastinference.Loader.model_from_file("/my/nice/model.onnx")
loaded_model.implement("/some/nice/place/", "mymodel", "fpga.binary")

References:

Parameters
  • model (NeuralNet) – The NeuralNet model to be implemented

  • out_path (str) – The folder in which the *.cpp and *.h files are stored.

  • out_name (str) – The filenames.

  • weight (float, optional) – The weight of this model inside an ensemble. The weight is ignored if it is 1.0, otherwise the prediction is scaled by the respective weight. Defaults to 1.0.

  • namespace (str, optional) – The namespace under which this model will be generated. Defaults to “FAST_INFERENCE”.

  • feature_type (str, optional) – The data types of the input features. Defaults to “double”.

  • label_type (str, optional) – The data types of the label. Defaults to “double”.

  • float_type (str, optional) – The floating point type used when required. Defaults to “double”.

  • int_type (str, optional) – The signed integer type used when required. Defaults to “signed int”.

  • uint_type (str, optional) – The unsigned integer type used when required. Defaults to “unsigned int”.

  • infer_types (bool, optional) – If True tries to infer the smallest possible data type which still guarantees a correct implementation. Defaults to True.

  • popcount (str, optional) – The popcount operation which should be used to compute popcount. If this is None, then __builtin_popcount or __builtin_popcountll is used depending on the binary_word_size required by the uint_type. Defaults to None.

  • lut_size (integer, optional) – If lut_size > 1 then a lookup table for each lut_size number of bits is generated to compute the popcount operation. For example, if binary_word_size is 32 and lut_size is 4, then a pack of 4 bits is evaluated at-once resulting in 32 / 4 = 8 lookups. If lut_size <= 1, then a for-loop is used to compute the popcount. Defaults to 0.