For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo struct
MLASparseSharedMemoryFP8
struct MLASparseSharedMemoryFP8[config: MLASparseConfig[config.qkv_dtype, config.b_topk_, config.num_mbars_, config.q_smem_depth_, config.q_tmem_depth_], scale_block_size: Int]
Fieldsβ
- βbase (
MLASparseSharedMemory[config]): - βk_scales (
InlineArray[Float32, Int((mul (b_topk_ // Int(2)), ceildiv(config.qk_depth, scale_block_size)))]): - βv_scales (
InlineArray[Float32, Int((mul ceildiv(config.v_depth, scale_block_size), b_topk_))]): - βk_fp8_tma_done (
InlineArray[SharedMemBarrier, Int(2)]): - βv_fp8_tma_done (
InlineArray[SharedMemBarrier, Int(2)]):
Implemented traitsβ
comptime membersβ
K_scales_per_tokenβ
comptime K_scales_per_token = ceildiv(config.qk_depth, scale_block_size)
K_SCALES_SIZEβ
comptime K_SCALES_SIZE = (MLASparseSharedMemoryFP8[config, scale_block_size].TOPK_PER_CTA * ceildiv(config.qk_depth, scale_block_size))
num_mbarsβ
comptime num_mbars = 2
TOPK_PER_CTAβ
comptime TOPK_PER_CTA = (config.B_TOPK // Int(2))
V_scales_per_tokenβ
comptime V_scales_per_token = ceildiv(config.v_depth, scale_block_size)
V_SCALES_SIZEβ
comptime V_SCALES_SIZE = (config.B_TOPK * ceildiv(config.v_depth, scale_block_size))
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!