Get started in C
This is a preview of the Modular Inference Engine. It is not publicly available yet and APIs are subject to change.
If you’re interested, please sign up for early access.
Our C API allows you to integrate the Modular Inference Engine into your high-performance application code, and run inference on any model from TensorFlow or PyTorch.
This page shows how you can use the C API to load a trained model and execute it with the Modular Inference Engine.
We also offer a Python API and a C++ API is coming soon.
Create a runtime context
The first thing you need is an M_RuntimeContext
, which is an application level object that sets up various resources such as threadpool and allocators during inference. We recommended you create one context and use throughout your application.
To create an M_RuntimeContext
, you need two other objects:
M_RuntimeConfig
: This configures details about the runtime context such as the number of threads to use and the logging level.M_Status
: This is the object through which Inference Engine passes all error messages.
Here’s how you can use both of these objects and then create the M_RuntimeContext
:
*status = M_newStatus();
M_Status *runtimeConfig = M_newRuntimeConfig();
M_RuntimeConfig *context = M_newRuntimeContext(runtimeConfig, status);
M_RuntimeContext if (M_isError(status)) {
(M_getError(status));
logErrorreturn EXIT_FAILURE;
}
Notice how this checks the M_Status
object with M_isError()
.
Compile the model
Now you can compile your trained TensorFlow or PyTorch model. You just need to specify the path to your model with an M_CompileConfig
object and call M_compileModel()
as shown here:
*compileConfig = M_newCompileConfig();
M_CompileConfig const char *resnetPath = argv[1];
(compileConfig, /*path=*/resnetPath);
M_setModelPath*compiledModel =
M_AsyncCompiledModel (context, compileConfig, status);
M_compileModelif (M_isError(status)) {
(M_getError(status));
logErrorreturn EXIT_FAILURE;
}
Compilation happens asynchronously and M_compileModel()
returns immediately.
Initialize the model
The M_AsyncCompiledModel
returned above is not ready for inference yet. You now need to initialize the model by calling M_initModel()
, which returns an instance of M_AsyncModel
:
*model = M_initModel(context, compiledModel, status);
M_AsyncModel if (M_isError(status)) {
(M_getError(status));
logErrorreturn EXIT_FAILURE;
}
You don’t need to await the M_AsyncCompiledModel
. M_initModel()
will internally wait for compilation to complete before initalizing the model. In case you need to await it before hand we provide a M_waitForCompilation
API to help with this. This is the general pattern followed by all Modular Inference Engine APIs that accept an asynchronous value as an argument.
M_initModel()
is also asynchronous and returns immediately. This step prepares the compiled model for fast execution by running and initializing some of the graph operations that are input-independent.
Run an inference
Now you’re ready to run an inference as follows.
// Define the input tensor specs.
int64_t shape[] = {1, 224, 224, 3};
*inputSpec =
M_TensorSpec (shape, /*rankSize=*/4, /*dtype=*/M_FLOAT32);
M_newTensorSpec
// Create the input tensor and borrow it into the model input.
// Borrowing the input means we don't do any copy and caller is responsible
// to make sure that the input stays alive till the inference is completed.
*inputToModel = M_newAsyncTensorArray(context,
M_AsyncTensorArray /*numInputs=*/1);
float inputTensor[1 * 224 * 224 * 3] = {0.0f};
(inputToModel, inputTensor, inputSpec, status);
M_borrowTensorIntoif (M_isError(status)) {
(M_getError(status));
logErrorreturn EXIT_FAILURE;
}
// Run the inference.
// This function blocks until the inference is complete.
("Running Inference");
logInfo*outputs =
M_AsyncTensorArray (context, model, inputToModel, status);
M_executeModelSyncif (M_isError(status)) {
(M_getError(status));
logErrorreturn EXIT_FAILURE;
}
Process the output
("Inference successfully completed");
logInfo*tensor = M_getTensorByIndexFrom(outputs, /*index=*/0, status);
M_AsyncTensor if (M_isError(status)) {
(M_getError(status));
logErrorreturn EXIT_FAILURE;
}
size_t numElements = M_getTensorNumElements(tensor);
const float *tensorData = (const float *)(M_getTensorData(tensor));
("ArgMax: %zu", argMax(tensorData, numElements)); printf
Clean up
Finally, you need to clean up all the resources.
(inputSpec);
M_freeTensorSpec(inputToModel);
M_freeAsyncTensorArray(outputs);
M_freeAsyncTensorArray
(model);
M_freeModel(compiledModel);
M_freeCompiledModel
(compileConfig);
M_freeCompileConfig(context);
M_freeRuntimeContext(config);
M_freeRuntimeConfig(status) M_freeStatus
You can see all the complete program below.
Full example
#include "modular/c/common.h"
#include "modular/c/context.h"
#include "modular/c/model.h"
#include "modular/c/tensor.h"
#include <stdio.h>
#include <stdlib.h>
size_t argMax(const float *arr, size_t numElements) {
float max = arr[0];
size_t maxIdx = 0;
for (size_t i = 1; i < numElements; ++i) {
if (arr[i] > max) {
= arr[i];
max = i;
maxIdx }
}
return maxIdx;
}
void logHelper(const char* level, const char* message) {
("%s: %s\n", level, message);
printf}
void logInfo(const char* message) {
("INFO", message);
logHelper}
void logError(const char* message) {
("ERROR", message);
logHelper}
int main(int argc, char **argv) {
if (argc != 2) {
("Usage: resnet-example <path to resnet saved model>");
printfreturn EXIT_FAILURE;
}
*status = M_newStatus();
M_Status
*runtimeConfig = M_newRuntimeConfig();
M_RuntimeConfig *context = M_newRuntimeContext(runtimeConfig, status);
M_RuntimeContext if (M_isError(status)) {
(M_getError(status));
logErrorreturn EXIT_FAILURE;
}
("Compiling Model");
logInfo*compileConfig = M_newCompileConfig();
M_CompileConfig const char *modelPath = argv[1];
(compileConfig, /*path=*/modelPath);
M_setModelPath*compiledModel =
M_AsyncCompiledModel (context, compileConfig, status);
M_compileModelif (M_isError(status)) {
(M_getError(status));
logErrorreturn EXIT_FAILURE;
}
("Initializing Model");
logInfo*model = M_initModel(context, compiledModel, status);
M_AsyncModel if (M_isError(status)) {
(M_getError(status));
logErrorreturn EXIT_FAILURE;
}
// Define the input tensor specs.
int64_t shape[] = {1, 224, 224, 3};
*inputSpec =
M_TensorSpec (shape, /*rankSize=*/4, /*dtype=*/M_FLOAT32);
M_newTensorSpec
// Create the input tensor and borrow it into the model input.
// Borrowing the input means we don't do any copy and caller is responsible
// to make sure that the input stays alive till the inference is completed.
*inputToModel = M_newAsyncTensorArray(context,
M_AsyncTensorArray /*numInputs=*/1);
float inputTensor[1 * 224 * 224 * 3] = {0.0f};
(inputToModel, inputTensor, inputSpec, status);
M_borrowTensorIntoif (M_isError(status)) {
(M_getError(status));
logErrorreturn EXIT_FAILURE;
}
// Run the inference.
// This function blocks until the inference is complete.
("Running Inference");
logInfo*outputs =
M_AsyncTensorArray (context, model, inputToModel, status);
M_executeModelSyncif (M_isError(status)) {
(M_getError(status));
logErrorreturn EXIT_FAILURE;
}
("Inference successfully completed");
logInfo*tensor = M_getTensorByIndexFrom(outputs, /*index=*/0, status);
M_AsyncTensor if (M_isError(status)) {
(M_getError(status));
logErrorreturn EXIT_FAILURE;
}
size_t numElements = M_getTensorNumElements(tensor);
const float *tensorData = (const float *)(M_getTensorData(tensor));
("ArgMax: %zu", argMax(tensorData, numElements));
printf
(tensor);
M_freeTensor(inputSpec);
M_freeTensorSpec(inputToModel);
M_freeAsyncTensorArray(outputs);
M_freeAsyncTensorArray
(model);
M_freeModel(compiledModel);
M_freeCompiledModelreturn EXIT_SUCCESS;
}
The Inference Engine is not yet available to the public, but if you’d like to become an early access partner, please join the waitlist here.