Mojo function
multimem_st
multimem_st[type: DType, *, count: Int, scope: Scope, consistency: Consistency, width: Int = 1](addr: UnsafePointer[SIMD[type, 1], address_space=AddressSpace(1)], values: StaticTuple[SIMD[type, width], count])
Stages an inline multimem.st instruction.
This operation performs a store to all memory locations pointed to by the multimem address using the specified memory consistency model and scope.
Notes: - Requires SM90+ GPU architecture (PTX ISA 8.1+). - The address must be a valid multimem address. - Supported type-width combinations must total 32/64/128 bits. - Default memory semantics: weak consistency (when not specified). - Vector stores (.v2/.v4) require matching total size constraints.
Example: ```mojo from gpu.memory import *
# Store 2 float32 values to multimem address.
multimem_st[DType.float32, count=2, scope=Scope.CTA, consistency=Consistency.RELAXED](
addr, StaticTuple[DType.float32, 2](val1, val2)
)
# Vector store of 4 float16x2 values.
multimem_st[DType.float16, count=4, scope=Scope.CLUSTER, consistency=Consistency.RELEASE, width=2](
addr, StaticTuple[DType.float16, 4](vec1, vec2, vec3, vec4)
)
```
# Store 2 float32 values to multimem address.
multimem_st[DType.float32, count=2, scope=Scope.CTA, consistency=Consistency.RELAXED](
addr, StaticTuple[DType.float32, 2](val1, val2)
)
# Vector store of 4 float16x2 values.
multimem_st[DType.float16, count=4, scope=Scope.CLUSTER, consistency=Consistency.RELEASE, width=2](
addr, StaticTuple[DType.float16, 4](vec1, vec2, vec3, vec4)
)
```
See Also: PTX ISA Documentation: https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-multimem-ld-reduce-multimem-st-multimem-red
Parameters:
- type (
DType
): The data type of elements to store (must be float16, bfloat16, or float32). - count (
Int
): Number of vector elements per store operation (2 or 4). - scope (
Scope
): Memory scope for visibility of the store operation (CTA/Cluster/GPU/System). - consistency (
Consistency
): Memory consistency semantics (weak/relaxed/release). - width (
Int
): Vector width modifier for packed data types (default 1).
Args:
- addr (
UnsafePointer[SIMD[type, 1], address_space=AddressSpace(1)]
): Multimem address in global address space pointing to multiple locations. - values (
StaticTuple[SIMD[type, width], count]
): Packed SIMD values to store, with count matching the template parameter.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!