Skip to main content

Python class

AttentionDispatchResolver

AttentionDispatchResolver

class max.nn.kv_cache.AttentionDispatchResolver(session, device, is_mla, n_kv_heads_per_device, num_q_heads_per_device=None, is_fp8_kv=False)

source

Bases: object

Resolves packed attention decode metadata via kernel custom ops.

Supports both MHA (mo.mha.decode.get_num_partitions) and MLA (mo.mla.compute_dispatch_args.scalar) decode kernels. The mode is selected automatically from kv_params.is_mla.

Parameters: