IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

balanced_target_q_per_cta

def balanced_target_q_per_cta(total_q: Int, topk: Int, blk_kv: Int, head_kv: Int, num_sms: Int, bm: Int = Int(128)) -> Int

Load-balanced queries-per-CTA cap for the scheduler q-chunking.

Targets ~num_sms*2 work items so each CTA processes ceil(q_count / bm) Q-groups against ONE resident KV block (instead of one CTA per bm queries). Rounded up to a multiple of bm (the fwd loops in bm-query groups), floored at bm, and capped at topk * blk_kv * 2 so a few huge rows can't starve the SMs.

Returns:

Int