For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo module
shmem_api
This is a work-in-progress implementation of the OpenSHMEM spec, in the future both ROCSHMEM and NVSHMEM will be supported.
You can find the current specification at http://openshmem.org/site/sites/default/site_files/OpenSHMEM-1.6.pdf
The headings below corrosspond to section 9: OpenSHMEM Library API.
comptime valuesβ
SHMEM_CMP_EQβ
comptime SHMEM_CMP_EQ = NVSHMEM_CMP_EQ
SHMEM_CMP_GEβ
comptime SHMEM_CMP_GE = NVSHMEM_CMP_GE
SHMEM_CMP_GTβ
comptime SHMEM_CMP_GT = NVSHMEM_CMP_GT
SHMEM_CMP_LEβ
comptime SHMEM_CMP_LE = NVSHMEM_CMP_LE
SHMEM_CMP_LTβ
comptime SHMEM_CMP_LT = NVSHMEM_CMP_LT
SHMEM_CMP_NEβ
comptime SHMEM_CMP_NE = NVSHMEM_CMP_NE
SHMEM_CMP_SENTINELβ
comptime SHMEM_CMP_SENTINEL = NVSHMEM_CMP_SENTINEL
SHMEM_SIGNAL_ADDβ
comptime SHMEM_SIGNAL_ADD = NVSHMEM_SIGNAL_ADD
SHMEM_SIGNAL_SETβ
comptime SHMEM_SIGNAL_SET = NVSHMEM_SIGNAL_SET
SHMEM_TEAM_INVALIDβ
comptime SHMEM_TEAM_INVALID = Int32(-1)
SHMEM_TEAM_NODEβ
comptime SHMEM_TEAM_NODE = Int32(2)
SHMEM_TEAM_SHAREDβ
comptime SHMEM_TEAM_SHARED = Int32(1)
shmem_team_tβ
comptime shmem_team_t = c_int
SHMEM_TEAM_WORLDβ
comptime SHMEM_TEAM_WORLD = Int32(0)
Structsβ
- β
SHMEMScope: Enables following the OpenSHMEM spec by default for put/get/iput/iget etc. While allowing NVIDIA extensions for block and warp scopes by passing a parameter. - β
SHMEMUniqueID: Unique ID that must be identical across all threads and nodes to establish communication.
Functionsβ
- β
shmem_barrier_all: Registers the arrival of a PE at a barrier and blocks the PE until all other PEs arrive at the barrier and all local updates and remote memory updates on the default context are completed. - β
shmem_barrier_all_on_stream: Mechanism for synchronizing all PEs at once. This routine blocks the calling PE until all PEs have called nvshmem_barrier_all. In a multithreaded NVSHMEM program, only the calling thread is blocked, however, it may not be called concurrently by multiple threads in the same PE. - β
shmem_calloc: Collectively allocate a zeroed block of symmetric memory. - β
shmem_create_uniqueid: Create a unique ID for rocSHMEM TCP bootstrap. - β
shmem_fence: Ensures ordering of delivery of operations on symmetric data objects. - β
shmem_finalize: A collective operation that releases all resources used by SHMEM. - β
shmem_free: Collectively deallocate symmetric memory. - β
shmem_g: Copies one data item from a remote PE. - β
shmem_get: Copies data from a specified PE. - β
shmem_get_nbi: Initiate a non-blocking copy of data from a specified PE. - β
shmem_init: A collective operation that allocates and initializes the resources used by the SHMEM library. - β
shmem_init_thread_mpi: Modular-specific init that enables initializing SHMEM on one GPU per thread. - β
shmem_init_thread_tcp: Modular-specific init that enables initializing SHMEM on one GPU per thread using TCP bootstrapping, without mpirun or other launchers. The bootstrap will wait untiltotal_nodes * gpus_per_nodehave completed the exchange of information before moving on from this function call. - β
shmem_malloc: Collectively allocate symmetric memory. - β
shmem_module_finalize: Finalizes the device state in the compiled function module and cleans up NVSHMEM operations. This should be called when NVSHMEM operations are no longer needed for the given device function. - β
shmem_module_init: Initializes the device state in the compiled function module so that it's able to perform SHMEM operations. Must have completed device initialization prior to calling this function. - β
shmem_my_pe: Returns the number of the calling PE. - β
shmem_n_pes: Returns the number of PEs running in a program. - β
shmem_p: Copies one data item to a remote PE. - β
shmem_put: Copy data from a contiguous local data object to a data object on a specified PE. - β
shmem_put_nbi: Initiate a non-blocking copy of data from a contiguous local data object to a data object on a specified PE. - β
shmem_put_signal_nbi: The nonblocking put-with-signal routines provide a method for copying data from a contiguous local data object to a data object on a specified PE and subsequently updating a remote flag to signal completion. - β
shmem_signal_op: The nvshmemx_signal_op operation atomically updates sig_addr with signal using operation sig_op on the specified PE. This operation can be used together with wait and test routines for efficient point-to-point synchronization. - β
shmem_signal_wait_until: Wait for a variable on the local PE to change from a signaling operation. - β
shmem_team_my_pe: Returns the number of the calling PE within a specified team.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!