Inference Engine Python API

The Inference Engine Python API reference.

This is a preview of the Modular Inference Engine. It is not publicly available yet and APIs are subject to change.

If you’re interested, please sign up for early access.

Executing a model with the Modular Inference Engine is easy:

  1. Create an InferenceSession and load a TensorFlow or PyTorch model with InferenceSession.load().

  2. Then run it by calling Model.execute() and passing your input data. This function returns the model output.

That’s it.

To see some code in action, see the Python get started guide.

class modular.engine.InferenceSession(num_threads: Optional[int] = None)

Manages an inference session in which you can load and run models.

Parameters:

num_threads (Optional[int]) – Number of threads to use for the inference session. This parameter defaults to the number of physical cores on your machine.

load(model_path: Union[str, Path]) Model

Loads a trained model and compiles it for inference.

Parameters:

model_path (Union[str, pathlib.Path]) – Path to a model. May be a TensorFlow model in the SavedModel format or a traceable PyTorch model.

Returns:

The loaded model, compiled and ready to execute.

Return type:

Model

Raises:

RuntimeError – If the path provided is invalid.

class modular.engine.Model

A loaded model that you can execute.

You should not instantiate this class directly. Instead, create a Model by passing your model file to InferenceSession.load(). Then you can run the model by passing your input data to Model.execute().

execute(*args) None

Executes the model with the provided input and returns outputs.

Parameters:

*args – Input tensors as ndarray data.

Returns:

Output tensors.

Return type:

ndarray

Raises:

RuntimeError – If the input tensors don’t match what the model expects.

property input_metadata: List[TensorSpec]

Metadata about the input tensors that the model accepts.

You can use this to query the tensor shapes and data types like this:

for tensor in model.input_metadata:
    print(f'shape: {tensor.shape}, dtype: {tensor.dtype}')
class modular.engine.TensorSpec

Defines the properties of a tensor, namely its shape and data type.

You can get a list of TensorSpec objects that specify the input tensors of a Model from Model.input_metadata().

property dtype: DType

A tensor data type.

property shape: List[int]

The shape of the tensor as a list of integers.

If a dimension is indeterminate for a certain axis, such as the first axis of a batched tensor, that axis is denoted by None.

class modular.engine.DType(value)

The tensor data type.

bool = 0
si8 = 1
si16 = 2
si32 = 3
si64 = 4
ui8 = 5
ui16 = 6
ui32 = 7
ui64 = 8
f16 = 9
f32 = 10
f64 = 11