For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo module

shmem_api

This is a work-in-progress implementation of the OpenSHMEM spec, in the future both ROCSHMEM and NVSHMEM will be supported.

You can find the current specification at http://openshmem.org/site/sites/default/site_files/OpenSHMEM-1.6.pdf

The headings below corrosspond to section 9: OpenSHMEM Library API.

`comptime` values

`SHMEM_CMP_EQ`

comptime SHMEM_CMP_EQ = NVSHMEM_CMP_EQ

`SHMEM_CMP_GE`

comptime SHMEM_CMP_GE = NVSHMEM_CMP_GE

`SHMEM_CMP_GT`

comptime SHMEM_CMP_GT = NVSHMEM_CMP_GT

`SHMEM_CMP_LE`

comptime SHMEM_CMP_LE = NVSHMEM_CMP_LE

`SHMEM_CMP_LT`

comptime SHMEM_CMP_LT = NVSHMEM_CMP_LT

`SHMEM_CMP_NE`

comptime SHMEM_CMP_NE = NVSHMEM_CMP_NE

`SHMEM_CMP_SENTINEL`

comptime SHMEM_CMP_SENTINEL = NVSHMEM_CMP_SENTINEL

`SHMEM_SIGNAL_ADD`

comptime SHMEM_SIGNAL_ADD = NVSHMEM_SIGNAL_ADD

`SHMEM_SIGNAL_SET`

comptime SHMEM_SIGNAL_SET = NVSHMEM_SIGNAL_SET

`SHMEM_TEAM_INVALID`

comptime SHMEM_TEAM_INVALID = Int32(-1)

`SHMEM_TEAM_NODE`

comptime SHMEM_TEAM_NODE = Int32(2)

`SHMEM_TEAM_SHARED`

comptime SHMEM_TEAM_SHARED = Int32(1)

`shmem_team_t`

comptime shmem_team_t = c_int

`SHMEM_TEAM_WORLD`

comptime SHMEM_TEAM_WORLD = Int32(0)

Structs

SHMEMScope: Enables following the OpenSHMEM spec by default for put/get/iput/iget etc. While allowing NVIDIA extensions for block and warp scopes by passing a parameter.
SHMEMUniqueID: Unique ID that must be identical across all threads and nodes to establish communication.

Functions

shmem_barrier_all: Registers the arrival of a PE at a barrier and blocks the PE until all other PEs arrive at the barrier and all local updates and remote memory updates on the default context are completed.
shmem_barrier_all_on_stream: Mechanism for synchronizing all PEs at once. This routine blocks the calling PE until all PEs have called nvshmem_barrier_all. In a multithreaded NVSHMEM program, only the calling thread is blocked, however, it may not be called concurrently by multiple threads in the same PE.
shmem_calloc: Collectively allocate a zeroed block of symmetric memory.
shmem_create_uniqueid: Create a unique ID for rocSHMEM TCP bootstrap.
shmem_fence: Ensures ordering of delivery of operations on symmetric data objects.
shmem_finalize: A collective operation that releases all resources used by SHMEM.
shmem_free: Collectively deallocate symmetric memory.
shmem_g: Copies one data item from a remote PE.
shmem_get: Copies data from a specified PE.
shmem_get_nbi: Initiate a non-blocking copy of data from a specified PE.
shmem_init: A collective operation that allocates and initializes the resources used by the SHMEM library.
shmem_init_thread_mpi: Modular-specific init that enables initializing SHMEM on one GPU per thread.
shmem_init_thread_tcp: Modular-specific init that enables initializing SHMEM on one GPU per thread using TCP bootstrapping, without mpirun or other launchers. The bootstrap will wait until total_nodes * gpus_per_node have completed the exchange of information before moving on from this function call.
shmem_malloc: Collectively allocate symmetric memory.
shmem_module_finalize: Finalizes the device state in the compiled function module and cleans up NVSHMEM operations. This should be called when NVSHMEM operations are no longer needed for the given device function.
shmem_module_init: Initializes the device state in the compiled function module so that it's able to perform SHMEM operations. Must have completed device initialization prior to calling this function.
shmem_my_pe: Returns the number of the calling PE.
shmem_n_pes: Returns the number of PEs running in a program.
shmem_p: Copies one data item to a remote PE.
shmem_put: Copy data from a contiguous local data object to a data object on a specified PE.
shmem_put_nbi: Initiate a non-blocking copy of data from a contiguous local data object to a data object on a specified PE.
shmem_put_signal_nbi: The nonblocking put-with-signal routines provide a method for copying data from a contiguous local data object to a data object on a specified PE and subsequently updating a remote flag to signal completion.
shmem_signal_op: The nvshmemx_signal_op operation atomically updates sig_addr with signal using operation sig_op on the specified PE. This operation can be used together with wait and test routines for efficient point-to-point synchronization.
shmem_signal_wait_until: Wait for a variable on the local PE to change from a signaling operation.
shmem_team_my_pe: Returns the number of the calling PE within a specified team.

comptime values​

SHMEM_CMP_EQ​

SHMEM_CMP_GE​

SHMEM_CMP_GT​

SHMEM_CMP_LE​

SHMEM_CMP_LT​

SHMEM_CMP_NE​

SHMEM_CMP_SENTINEL​

SHMEM_SIGNAL_ADD​

SHMEM_SIGNAL_SET​

SHMEM_TEAM_INVALID​

SHMEM_TEAM_NODE​

SHMEM_TEAM_SHARED​

shmem_team_t​

SHMEM_TEAM_WORLD​

Structs​

Functions​