IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

shmem_init_thread_tcp

shmem_init_thread_tcp(ctx: DeviceContext, var node_id: Int = -1, var total_nodes: Int = -1, var gpus_per_node: Int = -1, var server_ip: String = "-1", var server_port: Int = -1)

Modular-specific init that enables initializing SHMEM on one GPU per thread using TCP bootstrapping, without mpirun or other launchers. The bootstrap will wait until total_nodes * gpus_per_node have completed the exchange of information before moving on from this function call.

By default it will run in single node mode with all attached GPUS, which you can change with environment variables:

export SHMEM_NODE_ID=0              # 0-3 on 4 separate nodes
export SHMEM_TOTAL_NODES=4          # 4 nodes participating
export SHMEM_GPUS_PER_NODE=8        # 8 GPUs per node participating
export SHMEM_SERVER_IP=10.24.8.107  # IP of the network interface e.g. `ip addr show eno0`
export SHMEM_SERVER_PORT=44434      # Port for TCP bootstrapping

If using environment variables, simply pass the device context with the given device id for the thread:

var ctx = DeviceContext(device_id=device_id)
shmem_init_thread_tcp(ctx)

You can also explicitly pass arguments, for example if you have a CLI for your shmem application:


var ctx = DeviceContext(device_id=device_id)
shmem_init_thread_tcp(
    ctx,
    node_id=0,
    total_nodes=2,
    server_ip="10.24.8.107",
    server_port=44434
)

Arguments: ctx: the DeviceContext to associate with this thread. node_id: a number from 0..N where N is the amount of total_nodes - 1. gpus_per_node: the number of GPUs participating on this node. total_nodes: the amount of nodes participating. server_ip: the TCP bootstrap server that participating nodes connect to. server_port: the TCP bootstrap server port that participating nodes communicate over.

Raises:

If SHMEM initialization fails on any thread.