IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

derive_safe_max_globals

derive_safe_max_globals(num_k_mmas: Int) -> Int

Derive safe max_globals for uniform global load distribution.

Returns 1 if uniform distribution is safe under warp stagger, 0 otherwise.

The safety condition depends on the number of K-dimension MMA tiles (num_k_mmas). With warp stagger, WG0 runs 1 MMA phase ahead of WG1. When globals are uniformly distributed, a prefetch buffer_load_*_lds in block b writes to LDS stage h asynchronously. If block b+1's fragment loads read from the same stage, the async LDS write must complete before the ds_read — the MMA compute between them must provide enough cycles.

With num_k_mmas >= 2, each MMA block has 2+ MMAs (~32 cycles on MI355X), providing sufficient latency for async LDS writes (~20 cycles). With num_k_mmas == 1, the single MMA (~16 cycles) is insufficient.

Returns:

Int