Get started in Python

A walkthrough of the Python API, showing how to load and run a trained model.

This is a preview of the Modular Inference Engine. It is not publicly available yet and APIs are subject to change.

If you’re interested, please sign up for early access.

The Python API for the Modular Inference Engine makes it easy to instantly upgrade your model’s inference performance. With just a few lines of code, you can run any TensorFlow or PyTorch model with reduced latency and compute cost.

This page shows you how to load a trained TensorFlow model and execute it with the Modular Inference Engine. (It’s just as easy with a PyTorch model.)

We also offer a C API and our C++ API is coming soon.

Import Python modules

Nothing surprising here.

import numpy as np
from modular import engine
from pathlib import Path

Load The Model

First we need to create an InferenceSession and load the model:

session = engine.InferenceSession()
model_path = Path('resnet50_v1_savedmodel')
model = session.load(model_path)

This compiles the model into the Modular format for inference.

Run an inference

Before running the model, let’s check the input tensor shape and data type:

for tensor in model.input_metadata:
    print(f'shape: {tensor.shape}, dtype: {tensor.dtype}')
shape: [None, 224, 224, 3], dtype: DType.f32

The first dimension is None, meaning the batch size is dynamic.

Just to demonstrate our API, let’s run an inference with random data that matches the input shape:

input_tensor = np.random.rand(1, 224, 224, 3).astype(np.float32)

model.execute(input_tensor)
[array([[0.00010795, 0.00010784, 0.00049631, ..., 0.00016846, 0.00015613,
         0.00092829]], dtype=float32)]

That’s it! execute() returns the output as an ndarray.

We can also run 5 inferences at once by batching them together:

input_tensor_batch = np.repeat(input_tensor, 5, axis=0)

model.execute(input_tensor_batch)
[array([[0.00010795, 0.00010784, 0.00049631, ..., 0.00016846, 0.00015613,
         0.00092829],
        [0.00010795, 0.00010784, 0.00049631, ..., 0.00016846, 0.00015613,
         0.00092829],
        [0.00010795, 0.00010784, 0.00049631, ..., 0.00016846, 0.00015613,
         0.00092829],
        [0.00010795, 0.00010784, 0.00049631, ..., 0.00016846, 0.00015613,
         0.00092829],
        [0.00010795, 0.00010784, 0.00049631, ..., 0.00016846, 0.00015613,
         0.00092829]], dtype=float32)]

It’s the same result five times, but this is just to show how easy it is to use the Python API with a TensorFlow or PyTorch model.

For more details, check out the Python API reference.

The Inference Engine is not publicly available yet, but if you’d like to get early access, please sign up here.