Mojo package
sm100_structured
SM100 Structured Matmul - Refactored with encapsulated pipeline management.
This module provides the same SM100 matmul functionality as the original sm100 module, but with improved code organization:
Key abstractions:
- WorkIterator/SchedulerWorkIterator: Encapsulate work iteration and pipeline state
- RingBuffer/OutputRingBuffer: Encapsulate producer-consumer synchronization
- TileLoaderTMA: Encapsulate TMA tile loading logic
- Context managers for cleaner acquire/release patterns
Switching Implementations
Option 1: Environment Variable (Recommended)
Set MODULAR_USE_STRUCTURED_SM100=1 to use this implementation:
# Use original sm100 (default):
./bazelw run //max/kernels/test/gpu/linalg:test_matmul_sm100_smoke.mojo.test
# Use sm100_structured:
MODULAR_USE_STRUCTURED_SM100=1 ./bazelw run //max/kernels/test/gpu/linalg:test_matmul_sm100_smoke.mojo.testOption 2: Direct Import
# Original:
from linalg.matmul.gpu.sm100.matmul import (
blackwell_matmul_tma_umma_warp_specialized
)
# Structured (this module):
from linalg.matmul.gpu.sm100_structured import (
blackwell_matmul_tma_umma_warp_specialized
)See DOCS/testing_and_switching.md for full documentation.
Modules
-
matmul: SM100 Matmul CPU entry points - TMA setup and kernel launch wrappers. -
matmul_kernels: SM100 Matmul Kernel Structs - GPU kernel entry points and helpers. -
matmul_output: SM100 Matmul Output Pipeline - TMEM → SMEM → GMEM epilogue. -
pipeline: -
ring_buffer: Ring buffer for SM100 producer-consumer synchronization. -
tile_loader: TileLoader for SM100 matrix multiplication. -
tile_scheduler: -
tile_scheduler_splitk: -
tile_writer: TileWriter components for SM100 matrix multiplication epilogue.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!