Mojo package
sm100_structured
SM100 Structured Kernels - Blackwell matmul implementation.
Modules
-
barriers: Barrier abstractions for SM100 structured matmul kernels. -
block_scaled_matmul: CPU entry points for block-scaled SM100 matmul. -
block_scaled_matmul_kernel: Block-scaled SM100 matmul kernel - Structured kernel using tile pipelines. -
block_scaled_output_writer: BlockScaledTileWriter for SM100 block-scaled matmul output pipeline. -
block_scaled_smem: Shared memory layout for block-scaled SM100 matmul. -
blockwise_fp8_accumulator: Register-based accumulator for blockwise FP8 matmul. -
blockwise_fp8_matmul: CPU entry points for blockwise FP8 SM100 matmul. -
blockwise_fp8_matmul_kernel: Blockwise FP8 SM100 matmul kernel - Structured kernel with register accumulation. -
blockwise_fp8_output_writer: Output writer for blockwise FP8 SM100 matmul. -
blockwise_fp8_smem: Shared memory layout for blockwise FP8 SM100 matmul. -
config: SM100 matmul configuration types and utilities. -
dispatch: -
matmul: SM100 Matmul CPU entry points - TMA setup and kernel launch wrappers. -
matmul_kernels: SM100 Matmul Kernel Structs - GPU kernel entry points and helpers. -
output_writer: TileWriter for SM100 matmul output pipeline. -
pipeline: Producer-consumer pipeline utilities for SM100 structured kernels. -
tile_loader: TMA tile loader for SM100 matrix multiplication. -
tile_pipeline: Tile pipeline for SM100 producer-consumer synchronization. -
tile_scheduler: -
tile_scheduler_splitk: -
tile_writer: TileWriter components for SM100 matrix multiplication epilogue. -
tmem: Tensor Memory (TMEM) abstractions for SM100 Blackwell GPUs. -
tuning_configs: -
warp_context: RAII warp context managers for SM100 matmul kernel.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!