For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo struct
WarpRole1D1D
struct WarpRole1D1D[has_sfb: Bool = False, num_epi_warps: Int = 4]
Warp role for 1D-1D kernels with warp specialization.
Parameterized on has_sfb so the SFB TMA-load / TMEM-load warps (and the
scheduler's warp index) compile out cleanly on the MMA_N >= 64 path, and
on num_epi_warps so kernels with heavier consumer phases can grow the
pool without affecting other kernels.
Default layout (has_sfb=False, num_epi_warps=4 β 224 threads with
scheduler, MMA_N >= 64):
- Warps 0-3 (threads 0-127): Epilogue
- Warp 4 (threads 128-159): TMA Load
- Warp 5 (threads 160-191): MMA
- Warp 6 (threads 192-223): Scheduler
Extended layout (has_sfb=True, num_epi_warps=4 β 384 threads with
scheduler, MMA_N < 64):
- Warps 0-3 (threads 0-127): Epilogue
- Warp 4 (threads 128-159): TMA Load (A, B, SFA)
- Warp 5 (threads 160-191): MMA
- Warp 6 (threads 192-223): SFB TMA Load
- Warps 7-10 (threads 224-351): SFB TMEM Load
- Warp 11 (threads 352-383): Scheduler
The epilogue warps being at 0..NUM_EPILOGUE_THREADS-1 is important
because TMAStoreCoords uses warp_id == 0 for election.
Implemented traitsβ
AnyType,
Copyable,
ImplicitlyCopyable,
ImplicitlyDestructible,
Movable,
RegisterPassable,
TrivialRegisterPassable
comptime membersβ
EPILOGUE_WARP_STARTβ
comptime EPILOGUE_WARP_START = 0
LOAD_WARP_STARTβ
comptime LOAD_WARP_START = WarpRole1D1D[has_sfb, num_epi_warps].NUM_EPILOGUE_THREADS
MMA_WARP_STARTβ
comptime MMA_WARP_START = (WarpRole1D1D[has_sfb, num_epi_warps].LOAD_WARP_START + 32)
NUM_EPILOGUE_THREADSβ
comptime NUM_EPILOGUE_THREADS = (num_epi_warps * 32)
NUM_LOAD_THREADSβ
comptime NUM_LOAD_THREADS = 32
NUM_MMA_THREADSβ
comptime NUM_MMA_THREADS = 32
NUM_SCHEDULER_THREADSβ
comptime NUM_SCHEDULER_THREADS = 32
NUM_SFB_LOAD_THREADSβ
comptime NUM_SFB_LOAD_THREADS = 128
NUM_SFB_TMA_LOAD_THREADSβ
comptime NUM_SFB_TMA_LOAD_THREADS = 32
SCHEDULER_WARP_STARTβ
comptime SCHEDULER_WARP_START = (WarpRole1D1D[has_sfb, num_epi_warps].SFB_LOAD_WARP_START + 128) if has_sfb else WarpRole1D1D[has_sfb, num_epi_warps].SFB_TMA_LOAD_WARP_START
SFB_LOAD_WARP_STARTβ
comptime SFB_LOAD_WARP_START = (WarpRole1D1D[has_sfb, num_epi_warps].SFB_TMA_LOAD_WARP_START + 32)
SFB_TMA_LOAD_WARP_STARTβ
comptime SFB_TMA_LOAD_WARP_START = (WarpRole1D1D[has_sfb, num_epi_warps].MMA_WARP_START + 32)
TOTAL_THREADSβ
comptime TOTAL_THREADS = ((WarpRole1D1D[has_sfb, num_epi_warps].NUM_EPILOGUE_THREADS + 32) + 32)
TOTAL_THREADS_WITH_SCHEDβ
comptime TOTAL_THREADS_WITH_SCHED = (WarpRole1D1D[has_sfb, num_epi_warps].SCHEDULER_WARP_START + 32)
TOTAL_THREADS_WITH_SFBβ
comptime TOTAL_THREADS_WITH_SFB = ((WarpRole1D1D[has_sfb, num_epi_warps].TOTAL_THREADS + 32) + 128)
Methodsβ
is_epilogueβ
static is_epilogue() -> Bool
Returns True if current thread is in an epilogue warp (warps 0-3).
Returns:
is_loadβ
static is_load() -> Bool
Returns True if current thread is in the TMA load warp (warp 4).
Returns:
is_mmaβ
is_sfb_tma_loadβ
static is_sfb_tma_load() -> Bool
Returns True if current thread is in the SFB TMA load warp (warp 6).
Only meaningful when has_sfb (i.e. MMA_N < 64). Callers gate this
behind @parameter if Self.MMA_N < 64 so the check is unreachable on
the no-SFB path, where the same threads host the scheduler warp.
Returns:
is_sfb_loadβ
static is_sfb_load() -> Bool
Returns True if current thread is in an SFB TMEM load warp (warps 7-10).
Only meaningful when has_sfb (i.e. MMA_N < 64); callers gate the
check with @parameter if Self.MMA_N < 64.
Returns:
is_schedulerβ
static is_scheduler() -> Bool
Returns True if current thread is in the scheduler warp.
Scheduler = warp 6 when has_sfb = False, else warp 11. The scheduler
warp precomputes tile info into SMEM for consumer warps.
Returns:
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!