Skip to main content
Log in

Mojo function

multimem_st

multimem_st[type: DType, *, count: Int, scope: Scope, consistency: Consistency, width: Int = 1](addr: UnsafePointer[SIMD[type, 1], address_space=AddressSpace(1)], values: StaticTuple[SIMD[type, width], count])

Stages an inline multimem.st instruction.

This operation performs a store to all memory locations pointed to by the multimem address using the specified memory consistency model and scope.

Notes: - Requires SM90+ GPU architecture (PTX ISA 8.1+). - The address must be a valid multimem address. - Supported type-width combinations must total 32/64/128 bits. - Default memory semantics: weak consistency (when not specified). - Vector stores (.v2/.v4) require matching total size constraints.

Example: ```mojo from gpu.memory import *

# Store 2 float32 values to multimem address.
multimem_st[DType.float32, count=2, scope=Scope.CTA, consistency=Consistency.RELAXED](
addr, StaticTuple[DType.float32, 2](val1, val2)
)

# Vector store of 4 float16x2 values.
multimem_st[DType.float16, count=4, scope=Scope.CLUSTER, consistency=Consistency.RELEASE, width=2](
addr, StaticTuple[DType.float16, 4](vec1, vec2, vec3, vec4)
)
```
# Store 2 float32 values to multimem address.
multimem_st[DType.float32, count=2, scope=Scope.CTA, consistency=Consistency.RELAXED](
addr, StaticTuple[DType.float32, 2](val1, val2)
)

# Vector store of 4 float16x2 values.
multimem_st[DType.float16, count=4, scope=Scope.CLUSTER, consistency=Consistency.RELEASE, width=2](
addr, StaticTuple[DType.float16, 4](vec1, vec2, vec3, vec4)
)
```

See Also: PTX ISA Documentation: https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-multimem-ld-reduce-multimem-st-multimem-red

Parameters:

  • type (DType): The data type of elements to store (must be float16, bfloat16, or float32).
  • count (Int): Number of vector elements per store operation (2 or 4).
  • scope (Scope): Memory scope for visibility of the store operation (CTA/Cluster/GPU/System).
  • consistency (Consistency): Memory consistency semantics (weak/relaxed/release).
  • width (Int): Vector width modifier for packed data types (default 1).

Args:

  • addr (UnsafePointer[SIMD[type, 1], address_space=AddressSpace(1)]): Multimem address in global address space pointing to multiple locations.
  • values (StaticTuple[SIMD[type, width], count]): Packed SIMD values to store, with count matching the template parameter.