IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo struct

WarpRole1D1D

struct WarpRole1D1D[has_sfb: Bool = False, num_epi_warps: Int = 4]

Warp role for 1D-1D kernels with warp specialization.

Parameterized on has_sfb so the SFB TMA-load / TMEM-load warps (and the scheduler's warp index) compile out cleanly on the MMA_N >= 64 path, and on num_epi_warps so kernels with heavier consumer phases can grow the pool without affecting other kernels.

Default layout (has_sfb=False, num_epi_warps=4 β€” 224 threads with scheduler, MMA_N >= 64):

  • Warps 0-3 (threads 0-127): Epilogue
  • Warp 4 (threads 128-159): TMA Load
  • Warp 5 (threads 160-191): MMA
  • Warp 6 (threads 192-223): Scheduler

Extended layout (has_sfb=True, num_epi_warps=4 β€” 384 threads with scheduler, MMA_N < 64):

  • Warps 0-3 (threads 0-127): Epilogue
  • Warp 4 (threads 128-159): TMA Load (A, B, SFA)
  • Warp 5 (threads 160-191): MMA
  • Warp 6 (threads 192-223): SFB TMA Load
  • Warps 7-10 (threads 224-351): SFB TMEM Load
  • Warp 11 (threads 352-383): Scheduler

The epilogue warps being at 0..NUM_EPILOGUE_THREADS-1 is important because TMAStoreCoords uses warp_id == 0 for election.

Implemented traits​

AnyType, Copyable, ImplicitlyCopyable, ImplicitlyDestructible, Movable, RegisterPassable, TrivialRegisterPassable

comptime members​

EPILOGUE_WARP_START​

comptime EPILOGUE_WARP_START = 0

LOAD_WARP_START​

comptime LOAD_WARP_START = WarpRole1D1D[has_sfb, num_epi_warps].NUM_EPILOGUE_THREADS

MMA_WARP_START​

comptime MMA_WARP_START = (WarpRole1D1D[has_sfb, num_epi_warps].LOAD_WARP_START + 32)

NUM_EPILOGUE_THREADS​

comptime NUM_EPILOGUE_THREADS = (num_epi_warps * 32)

NUM_LOAD_THREADS​

comptime NUM_LOAD_THREADS = 32

NUM_MMA_THREADS​

comptime NUM_MMA_THREADS = 32

NUM_SCHEDULER_THREADS​

comptime NUM_SCHEDULER_THREADS = 32

NUM_SFB_LOAD_THREADS​

comptime NUM_SFB_LOAD_THREADS = 128

NUM_SFB_TMA_LOAD_THREADS​

comptime NUM_SFB_TMA_LOAD_THREADS = 32

SCHEDULER_WARP_START​

comptime SCHEDULER_WARP_START = (WarpRole1D1D[has_sfb, num_epi_warps].SFB_LOAD_WARP_START + 128) if has_sfb else WarpRole1D1D[has_sfb, num_epi_warps].SFB_TMA_LOAD_WARP_START

SFB_LOAD_WARP_START​

comptime SFB_LOAD_WARP_START = (WarpRole1D1D[has_sfb, num_epi_warps].SFB_TMA_LOAD_WARP_START + 32)

SFB_TMA_LOAD_WARP_START​

comptime SFB_TMA_LOAD_WARP_START = (WarpRole1D1D[has_sfb, num_epi_warps].MMA_WARP_START + 32)

TOTAL_THREADS​

comptime TOTAL_THREADS = ((WarpRole1D1D[has_sfb, num_epi_warps].NUM_EPILOGUE_THREADS + 32) + 32)

TOTAL_THREADS_WITH_SCHED​

comptime TOTAL_THREADS_WITH_SCHED = (WarpRole1D1D[has_sfb, num_epi_warps].SCHEDULER_WARP_START + 32)

TOTAL_THREADS_WITH_SFB​

comptime TOTAL_THREADS_WITH_SFB = ((WarpRole1D1D[has_sfb, num_epi_warps].TOTAL_THREADS + 32) + 128)

Methods​

is_epilogue​

static is_epilogue() -> Bool

Returns True if current thread is in an epilogue warp (warps 0-3).

Returns:

Bool

is_load​

static is_load() -> Bool

Returns True if current thread is in the TMA load warp (warp 4).

Returns:

Bool

is_mma​

static is_mma() -> Bool

Returns True if current thread is in the MMA warp (warp 5).

Returns:

Bool

is_sfb_tma_load​

static is_sfb_tma_load() -> Bool

Returns True if current thread is in the SFB TMA load warp (warp 6).

Only meaningful when has_sfb (i.e. MMA_N < 64). Callers gate this behind @parameter if Self.MMA_N < 64 so the check is unreachable on the no-SFB path, where the same threads host the scheduler warp.

Returns:

Bool

is_sfb_load​

static is_sfb_load() -> Bool

Returns True if current thread is in an SFB TMEM load warp (warps 7-10).

Only meaningful when has_sfb (i.e. MMA_N < 64); callers gate the check with @parameter if Self.MMA_N < 64.

Returns:

Bool

is_scheduler​

static is_scheduler() -> Bool

Returns True if current thread is in the scheduler warp.

Scheduler = warp 6 when has_sfb = False, else warp 11. The scheduler warp precomputes tile info into SMEM for consumer warps.

Returns:

Bool