Python class
AttentionDispatchResolver
AttentionDispatchResolver
class max.nn.kv_cache.AttentionDispatchResolver(session, device, is_mla, n_kv_heads_per_device, num_q_heads_per_device=None, is_fp8_kv=False)
Bases: object
Resolves packed attention decode metadata via kernel custom ops.
Supports both MHA (mo.mha.decode.get_num_partitions) and MLA
(mo.mla.compute_dispatch_args.scalar) decode kernels. The mode
is selected automatically from kv_params.is_mla.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!