import numpy as np
from modular import engine
from pathlib import Path
Get started in Python
This is a preview of the Modular Inference Engine. It is not publicly available yet and APIs are subject to change.
If you’re interested, please sign up for early access.
The Python API for the Modular Inference Engine makes it easy to instantly upgrade your model’s inference performance. With just a few lines of code, you can run any TensorFlow or PyTorch model with reduced latency and compute cost.
This page shows you how to load a trained TensorFlow model and execute it with the Modular Inference Engine. (It’s just as easy with a PyTorch model.)
We also offer a C API and our C++ API is coming soon.
Import Python modules
Nothing surprising here.
Load The Model
First we need to create an InferenceSession
and load the model:
= engine.InferenceSession()
session = Path('resnet50_v1_savedmodel')
model_path = session.load(model_path) model
This compiles the model into the Modular format for inference.
Run an inference
Before running the model, let’s check the input tensor shape and data type:
for tensor in model.input_metadata:
print(f'shape: {tensor.shape}, dtype: {tensor.dtype}')
shape: [None, 224, 224, 3], dtype: DType.f32
The first dimension is None
, meaning the batch size is dynamic.
Just to demonstrate our API, let’s run an inference with random data that matches the input shape:
= np.random.rand(1, 224, 224, 3).astype(np.float32)
input_tensor
model.execute(input_tensor)
[array([[0.00010795, 0.00010784, 0.00049631, ..., 0.00016846, 0.00015613,
0.00092829]], dtype=float32)]
That’s it! execute()
returns the output as an ndarray
.
We can also run 5 inferences at once by batching them together:
= np.repeat(input_tensor, 5, axis=0)
input_tensor_batch
model.execute(input_tensor_batch)
[array([[0.00010795, 0.00010784, 0.00049631, ..., 0.00016846, 0.00015613,
0.00092829],
[0.00010795, 0.00010784, 0.00049631, ..., 0.00016846, 0.00015613,
0.00092829],
[0.00010795, 0.00010784, 0.00049631, ..., 0.00016846, 0.00015613,
0.00092829],
[0.00010795, 0.00010784, 0.00049631, ..., 0.00016846, 0.00015613,
0.00092829],
[0.00010795, 0.00010784, 0.00049631, ..., 0.00016846, 0.00015613,
0.00092829]], dtype=float32)]
It’s the same result five times, but this is just to show how easy it is to use the Python API with a TensorFlow or PyTorch model.
For more details, check out the Python API reference.
The Inference Engine is not publicly available yet, but if you’d like to get early access, please sign up here.