Get started in C

A walkthrough of the C API, showing how to load and run a trained model.

This is a preview of the Modular Inference Engine. It is not publicly available yet and APIs are subject to change.

If you’re interested, please sign up for early access.

Our C API allows you to integrate the Modular Inference Engine into your high-performance application code, and run inference on any model from TensorFlow or PyTorch.

This page shows how you can use the C API to load a trained model and execute it with the Modular Inference Engine.

We also offer a Python API and a C++ API is coming soon.

Create a runtime context

The first thing you need is an M_RuntimeContext, which is an application level object that sets up various resources such as threadpool and allocators during inference. We recommended you create one context and use throughout your application.

To create an M_RuntimeContext, you need two other objects:

  • M_RuntimeConfig: This configures details about the runtime context such as the number of threads to use and the logging level.
  • M_Status: This is the object through which Inference Engine passes all error messages.

Here’s how you can use both of these objects and then create the M_RuntimeContext:

M_Status *status = M_newStatus();
M_RuntimeConfig *runtimeConfig = M_newRuntimeConfig();
M_RuntimeContext *context = M_newRuntimeContext(runtimeConfig, status);
if (M_isError(status)) {
  logError(M_getError(status));
  return EXIT_FAILURE;
}

Notice how this checks the M_Status object with M_isError().

Compile the model

Now you can compile your trained TensorFlow or PyTorch model. You just need to specify the path to your model with an M_CompileConfig object and call M_compileModel() as shown here:

M_CompileConfig *compileConfig = M_newCompileConfig();
const char *resnetPath = argv[1];
M_setModelPath(compileConfig, /*path=*/resnetPath);
M_AsyncCompiledModel *compiledModel =
    M_compileModel(context, compileConfig, status);
if (M_isError(status)) {
  logError(M_getError(status));
  return EXIT_FAILURE;
}

Compilation happens asynchronously and M_compileModel() returns immediately.

Initialize the model

The M_AsyncCompiledModel returned above is not ready for inference yet. You now need to initialize the model by calling M_initModel(), which returns an instance of M_AsyncModel:

M_AsyncModel *model = M_initModel(context, compiledModel, status);
if (M_isError(status)) {
  logError(M_getError(status));
  return EXIT_FAILURE;
}

You don’t need to await the M_AsyncCompiledModel. M_initModel() will internally wait for compilation to complete before initalizing the model. In case you need to await it before hand we provide a M_waitForCompilation API to help with this. This is the general pattern followed by all Modular Inference Engine APIs that accept an asynchronous value as an argument.

M_initModel() is also asynchronous and returns immediately. This step prepares the compiled model for fast execution by running and initializing some of the graph operations that are input-independent.

Run an inference

Now you’re ready to run an inference as follows.

// Define the input tensor specs.
int64_t shape[] = {1, 224, 224, 3};
M_TensorSpec *inputSpec =
    M_newTensorSpec(shape, /*rankSize=*/4, /*dtype=*/M_FLOAT32);

// Create the input tensor and borrow it into the model input.
// Borrowing the input means we don't do any copy and caller is responsible
// to make sure that the input stays alive till the inference is completed.
M_AsyncTensorArray *inputToModel = M_newAsyncTensorArray(context,
                                                        /*numInputs=*/1);
float inputTensor[1 * 224 * 224 * 3] = {0.0f};
M_borrowTensorInto(inputToModel, inputTensor, inputSpec, status);
if (M_isError(status)) {
  logError(M_getError(status));
  return EXIT_FAILURE;
}

// Run the inference.
// This function blocks until the inference is complete.
logInfo("Running Inference");
M_AsyncTensorArray *outputs =
    M_executeModelSync(context, model, inputToModel, status);
if (M_isError(status)) {
  logError(M_getError(status));
  return EXIT_FAILURE;
}

Process the output

logInfo("Inference successfully completed");
M_AsyncTensor *tensor = M_getTensorByIndexFrom(outputs, /*index=*/0, status);
if (M_isError(status)) {
  logError(M_getError(status));
  return EXIT_FAILURE;
}

size_t numElements = M_getTensorNumElements(tensor);
const float *tensorData = (const float *)(M_getTensorData(tensor));
printf("ArgMax: %zu", argMax(tensorData, numElements));

Clean up

Finally, you need to clean up all the resources.

M_freeTensorSpec(inputSpec);
M_freeAsyncTensorArray(inputToModel);
M_freeAsyncTensorArray(outputs);

M_freeModel(model);
M_freeCompiledModel(compiledModel);

M_freeCompileConfig(compileConfig);
M_freeRuntimeContext(context);
M_freeRuntimeConfig(config);
M_freeStatus(status)

You can see all the complete program below.

Full example

#include "modular/c/common.h"
#include "modular/c/context.h"
#include "modular/c/model.h"
#include "modular/c/tensor.h"

#include <stdio.h>
#include <stdlib.h>

size_t argMax(const float *arr, size_t numElements) {
  float max = arr[0];
  size_t maxIdx = 0;
  for (size_t i = 1; i < numElements; ++i) {
    if (arr[i] > max) {
      max = arr[i];
      maxIdx = i;
    }
  }
  return maxIdx;
}

void logHelper(const char* level, const char* message) {
  printf("%s: %s\n", level, message);
}

void logInfo(const char* message) {
  logHelper("INFO", message);
}

void logError(const char* message) {
  logHelper("ERROR", message);
}

int main(int argc, char **argv) {
  if (argc != 2) {
    printf("Usage: resnet-example <path to resnet saved model>");
    return EXIT_FAILURE;
  }

  M_Status *status = M_newStatus();

  M_RuntimeConfig *runtimeConfig = M_newRuntimeConfig();
  M_RuntimeContext *context = M_newRuntimeContext(runtimeConfig, status);
  if (M_isError(status)) {
    logError(M_getError(status));
    return EXIT_FAILURE;
  }

  logInfo("Compiling Model");
  M_CompileConfig *compileConfig = M_newCompileConfig();
  const char *modelPath = argv[1];
  M_setModelPath(compileConfig, /*path=*/modelPath);
  M_AsyncCompiledModel *compiledModel =
      M_compileModel(context, compileConfig, status);
  if (M_isError(status)) {
    logError(M_getError(status));
    return EXIT_FAILURE;
  }

  logInfo("Initializing Model");
  M_AsyncModel *model = M_initModel(context, compiledModel, status);
  if (M_isError(status)) {
    logError(M_getError(status));
    return EXIT_FAILURE;
  }

  // Define the input tensor specs.
  int64_t shape[] = {1, 224, 224, 3};
  M_TensorSpec *inputSpec =
      M_newTensorSpec(shape, /*rankSize=*/4, /*dtype=*/M_FLOAT32);

  // Create the input tensor and borrow it into the model input.
  // Borrowing the input means we don't do any copy and caller is responsible
  // to make sure that the input stays alive till the inference is completed.
  M_AsyncTensorArray *inputToModel = M_newAsyncTensorArray(context,
                                                          /*numInputs=*/1);
  float inputTensor[1 * 224 * 224 * 3] = {0.0f};
  M_borrowTensorInto(inputToModel, inputTensor, inputSpec, status);
  if (M_isError(status)) {
    logError(M_getError(status));
    return EXIT_FAILURE;
  }

  // Run the inference.
  // This function blocks until the inference is complete.
  logInfo("Running Inference");
  M_AsyncTensorArray *outputs =
      M_executeModelSync(context, model, inputToModel, status);
  if (M_isError(status)) {
    logError(M_getError(status));
    return EXIT_FAILURE;
  }

  logInfo("Inference successfully completed");
  M_AsyncTensor *tensor = M_getTensorByIndexFrom(outputs, /*index=*/0, status);
  if (M_isError(status)) {
    logError(M_getError(status));
    return EXIT_FAILURE;
  }

  size_t numElements = M_getTensorNumElements(tensor);
  const float *tensorData = (const float *)(M_getTensorData(tensor));
  printf("ArgMax: %zu", argMax(tensorData, numElements));

  M_freeTensor(tensor);
  M_freeTensorSpec(inputSpec);
  M_freeAsyncTensorArray(inputToModel);
  M_freeAsyncTensorArray(outputs);

  M_freeModel(model);
  M_freeCompiledModel(compiledModel);
  return EXIT_SUCCESS;
}

The Inference Engine is not yet available to the public, but if you’d like to become an early access partner, please join the waitlist here.