Mojo package
sm100
NVIDIA SM100 (Blackwell) attention kernels.
Covers MHA (flash-attention v4) and MLA (multi-head latent attention) for both prefill and decode, including FP8 and block-scaled quantization variants.
Packagesβ
Modulesβ
- β
attention: FA4 (Flash Attention 4) configuration for SM100 (Blackwell) kernels. - β
attention_utils: Shared SM100 attention primitives used by both MHA and MLA kernels. - β
correction_warp: Correction warp group logic for FA4 (SM100 Flash Attention). - β
dispatch: - β
kernel: - β
load_warp: TMA load warp logic for FA4 (SM100 Flash Attention). - β
mha_1q: - β
mla_decode_combine: MLA Decode Split-K Combine Kernel for SM100 (B200). - β
mla_decode_dispatch: - β
mla_decode_kv_bf16: - β
mla_decode_kv_fp8: - β
mla_decode_qkv_fp8: Native FP8 MLA decode kernel for SM100 (B200). - β
mla_decode_qkv_fp8_per_token_scale_rope_aware: SnapMLA FP8+BF16 MLA decode kernel for SM100 (B200). - β
mla_decode_sparse: - β
mla_decode_sparse_kv_fp8: - β
mla_decode_utils: - β
mla_prefill: - β
mla_prefill_blockscale: - β
mla_prefill_generic: - β
mla_prefill_per_token_scale: Per-token-scale MLA prefill kernel. - β
mla_prefill_utils: - β
mma_warp: MMA warp logic for FA4 (SM100 Flash Attention). - β
smem: Shared memory layout for SM100 attention kernels. - β
softmax_warp: Softmax warp group logic for FA4 (SM100 Flash Attention).
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!