For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo struct
MLASparseSharedMemoryFP8
struct MLASparseSharedMemoryFP8[config: MLASparseConfig[config.qkv_dtype, config.b_topk_, config.num_mbars_, config.q_smem_depth_, config.q_tmem_depth_], scale_block_size: Int]
Fieldsβ
- βbase (
MLASparseSharedMemory[config]): - βk_scales (
InlineArray[Float32, MLASparseSharedMemoryFP8[config, scale_block_size].K_SCALES_SIZE]): - βv_scales (
InlineArray[Float32, MLASparseSharedMemoryFP8[config, scale_block_size].V_SCALES_SIZE]): - βk_fp8_tma_done (
InlineArray[SharedMemBarrier, 2]): - βv_fp8_tma_done (
InlineArray[SharedMemBarrier, 2]):
Implemented traitsβ
comptime membersβ
K_scales_per_tokenβ
comptime K_scales_per_token = ceildiv(config.qk_depth, scale_block_size)
K_SCALES_SIZEβ
comptime K_SCALES_SIZE = (MLASparseSharedMemoryFP8[config, scale_block_size].TOPK_PER_CTA * MLASparseSharedMemoryFP8[config, scale_block_size].K_scales_per_token)
num_mbarsβ
comptime num_mbars = 2
TOPK_PER_CTAβ
comptime TOPK_PER_CTA = (config.B_TOPK // 2)
V_scales_per_tokenβ
comptime V_scales_per_token = ceildiv(config.v_depth, scale_block_size)
V_SCALES_SIZEβ
comptime V_SCALES_SIZE = (config.B_TOPK * MLASparseSharedMemoryFP8[config, scale_block_size].V_scales_per_token)
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!