For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo struct

MHAConfig

struct MHAConfig[dtype: DType]

Fields

num_heads (Int):
depth (Int):
padded_depth (Int):
num_queries_per_block (Int):
num_keys_per_block (Int):
BK (Int):
WM (Int):
WN (Int):
num_pipeline_stages (Int):
k_group_size (Int):
algorithm (FlashAttentionAlgorithm):
swizzle_mode (TensorMapSwizzle):

Implemented traits

AnyType, Copyable, ImplicitlyCopyable, ImplicitlyDeletable, Movable, RegisterPassable, TrivialRegisterPassable, Writable

Methods

`init`

def __init__(num_heads: Int, depth: Int, num_queries_per_block: Optional[Int] = None, num_keys_per_block: Optional[Int] = None, BK: Optional[Int] = None, WM: Optional[Int] = None, WN: Optional[Int] = None, num_pipeline_stages: Int = Int(4), k_group_size: Int = Int(1), algorithm: FlashAttentionAlgorithm = FlashAttentionAlgorithm(Int(-1)), swizzle_mode: TensorMapSwizzle = TensorMapSwizzle.SWIZZLE_128B) -> Self

`block_m`

def block_m(self) -> Int

Returns:

Fields​

Implemented traits​

Methods​

__init__​

block_m​

block_n​

block_k​

warp_m​

warp_n​

num_warps_m​

num_warps_n​

num_consumer_threads​

num_producer_threads​

num_threads​

swizzle_granularity​

q_smem_size​

kv_smem_size​

k_smem_size​

v_smem_size​

p_smem_size​

warp_scratch_smem_size​

shared_mem_bytes​

write_to​

Fields

Implemented traits

Methods

`init`

`block_m`

`block_n`

`block_k`

`warp_m`

`warp_n`

`num_warps_m`

`num_warps_n`

`num_consumer_threads`

`num_producer_threads`

`num_threads`

`swizzle_granularity`

`q_smem_size`

`kv_smem_size`

`k_smem_size`

`v_smem_size`

`p_smem_size`

`warp_scratch_smem_size`

`shared_mem_bytes`

`write_to`