For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

balanced_target_q_per_cta

def balanced_target_q_per_cta(total_q: Int, topk: Int, blk_kv: Int, head_kv: Int, num_sms: Int, bm: Int = Int(128)) -> Int

Load-balanced queries-per-CTA cap for the scheduler q-chunking.

Targets ~num_sms*2 work items so each CTA processes ceil(q_count / bm) Q-groups against ONE resident KV block (instead of one CTA per bm queries). Rounded up to a multiple of bm (the fwd loops in bm-query groups), floored at bm, and capped at topk * blk_kv * 2 so a few huge rows can't starve the SMs.

Returns:

Int