Mojo package
sm100
NVIDIA SM100 (Blackwell) attention kernels.
Covers MHA (flash-attention v4) and MLA (multi-head latent attention) for both prefill and decode, including FP8 and block-scaled quantization variants.
Modules
-
attention: FA4 (Flash Attention 4) configuration for SM100 (Blackwell) kernels. -
attention_utils: Shared SM100 attention primitives used by both MHA and MLA kernels. -
correction_warp: Correction warp group logic for FA4 (SM100 Flash Attention). -
dispatch: -
kernel: -
load_warp: TMA load warp logic for FA4 (SM100 Flash Attention). -
mha_1q: -
mla_decode_combine: MLA Decode Split-K Combine Kernel for SM100 (B200). -
mla_decode_dispatch: -
mla_decode_kv_bf16: -
mla_decode_kv_fp8: -
mla_decode_qkv_fp8: Native FP8 MLA decode kernel for SM100 (B200). -
mla_decode_qkv_fp8_per_token_scale_rope_aware: SnapMLA FP8+BF16 MLA decode kernel for SM100 (B200). -
mla_decode_utils: -
mla_prefill: -
mla_prefill_blockscale: -
mla_prefill_generic: -
mla_prefill_per_token_scale: Per-token-scale MLA prefill kernel. -
mla_prefill_utils: -
mma_warp: MMA warp logic for FA4 (SM100 Flash Attention). -
smem: Shared memory layout for SM100 attention kernels. -
softmax_warp: Softmax warp group logic for FA4 (SM100 Flash Attention).
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!