IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo struct

ScheduleConfig

struct ScheduleConfig

Tunable parameters for schedule generation.

Controls the structural decisions in program builders: scheduling strategy, barrier placement, and drain behavior.

The default configuration auto-derives wait counts from the program structure (Halide-inspired: declare intent, derive consequences). Manual wait overrides are available for testing and experimentation but should not be needed for correct operation.

Fields​

  • ​scheduling (SchedulingStrategy): The strategy for op ordering (IDENTITY, GREEDY, or CSP).
  • ​sched_barrier_mask (Int): The bitmask of blocks that get trailing schedule_barrier fences. Default: 0b01010101 (blocks 0, 2, 4, 6).
  • ​auto_waits (Bool): Whether to auto-derive wait counts from schedule order.
  • ​drain_lgkm_mask (Int): The per-block bitmask for selective LDS drains.
  • ​auto_drain (Bool): Whether to auto-derive the drain mask from channel analysis.
  • ​lds_contention_penalty (Int): The CSP solver penalty for LDS port overlap.
  • ​wait_lgkm_first (Int): The manual wait_lgkm(N) override (used when auto_waits=False).
  • ​wait_vm_last (Int): The manual wait_vm(N) override (used when auto_waits=False).
  • ​lgkm_per_load_a (Int): The number of lgkmcnt ops per channel-A load (for wait derivation).
  • ​lgkm_per_load_b (Int): The number of lgkmcnt ops per channel-B load (for wait derivation).
  • ​lgkm_after_last (Bool): Whether to insert wait_lgkm(0) after the last block barrier.
  • ​minimal_barriers (Bool): Whether to suppress per-block s_barriers and set_prio pairs and emit s_barrier only at top-of-half and the first cross-stage block.
  • ​omit_mma_set_prio (Bool): When minimal_barriers=True, whether to drop the pre-MMA s_setprio[1] entirely so the LLVM register allocator can reuse VGPRs across it.
  • ​max_vgpr (Int): The hint for the cost model on the kernel's VGPR budget. Default is effectively unlimited.
  • ​global_before_frag (Bool): Whether to swap the in-block emission order of global loads and fragment loads. Default emits frags first then prefetches.
  • ​barrier_before_pre_ops (Bool): Whether to move the per-block pre_sync + barrier section before the frag/prefetch section instead of between prefetch and MMA.
  • ​inter_block_lgkm_drain (Bool): Whether to populate entry_wait_lgkm on non-top, non-cross-stage blocks with wait_lgkm(0) so an inter-mini LDS drain fires between consecutive same-half MMAs.
  • ​wrap_waits_with_sched_barrier (Bool): Whether to wrap each contiguous wait/barrier group with schedule_barrier() on both sides to fence the LLVM machine scheduler.
  • ​partial_prologue_drain (Bool): Whether to skip the framework prologue's wait_vm(0) drains and inter-stage barrier so prefetches stay in flight on entry to the kernel.

Implemented traits​

AnyType, Copyable, ImplicitlyCopyable, ImplicitlyDeletable, Movable

Methods​

__init__​

def __init__(out self, *, scheduling: SchedulingStrategy = SchedulingStrategy.IDENTITY, sched_barrier_mask: Int = 85, auto_waits: Bool = True, drain_lgkm_mask: Int = 0, auto_drain: Bool = False, lds_contention_penalty: Int = 0, wait_lgkm_first: Int = 8, wait_vm_last: Int = 6, lgkm_per_load_a: Int = 0, lgkm_per_load_b: Int = 0, lgkm_after_last: Bool = False, minimal_barriers: Bool = False, omit_mma_set_prio: Bool = False, max_vgpr: Int = 999999, global_before_frag: Bool = False, barrier_before_pre_ops: Bool = False, inter_block_lgkm_drain: Bool = False, partial_prologue_drain: Bool = False, wrap_waits_with_sched_barrier: Bool = False)

from_strategies​

static def from_strategies(*, scheduling: SchedulingStrategy = SchedulingStrategy.IDENTITY, max_vgpr: Int = 999999, lds_contention_penalty: Int = 0, minimal_barriers: Bool = False, omit_mma_set_prio: Bool = False, sched_barrier_mask: Int = 85, wrap_waits_with_sched_barrier: Bool = False, barrier_before_pre_ops: Bool = False, auto_waits: Bool = True, drain_lgkm_mask: Int = 0, auto_drain: Bool = False, wait_lgkm_first: Int = 8, wait_vm_last: Int = 6, lgkm_after_last: Bool = False, inter_block_lgkm_drain: Bool = False, partial_prologue_drain: Bool = False, global_before_frag: Bool = False, lgkm_per_load_a: Int = 0, lgkm_per_load_b: Int = 0) -> Self

Constructs a ScheduleConfig from grouped strategy values.

Equivalent to the flat-field constructor but groups related flags by phase (barrier / wait / load). pipeline.strategies provides named factories (BarrierStrategy.minimal_no_set_prio etc.) that callers can spread into this constructor.

Existing flat-field callers continue to work unchanged.

Args:

  • ​scheduling (SchedulingStrategy): CSP solver scheduling strategy.
  • ​max_vgpr (Int): VGPR budget hint for the cost model.
  • ​lds_contention_penalty (Int): Penalty for LDS port overlap.
  • ​minimal_barriers (Bool): Suppress per-block s_barriers and set_prio pairs.
  • ​omit_mma_set_prio (Bool): Drop the pre-MMA s_setprio[1] when minimal_barriers=True.
  • ​sched_barrier_mask (Int): Bitmask of which blocks get trailing schedule_barrier fences.
  • ​wrap_waits_with_sched_barrier (Bool): Wrap each contiguous wait/barrier group with schedule_barrier.
  • ​barrier_before_pre_ops (Bool): Move pre_sync + barrier ahead of the frag/global section.
  • ​auto_waits (Bool): Auto-derive wait counts from program structure.
  • ​drain_lgkm_mask (Int): Per-block bitmask for selective LDS drains.
  • ​auto_drain (Bool): Auto-derive drain_lgkm_mask from channel analysis.
  • ​wait_lgkm_first (Int): Manual wait_lgkm override.
  • ​wait_vm_last (Int): Manual wait_vm override for the last block.
  • ​lgkm_after_last (Bool): Insert wait_lgkm(0) after the last block's barrier.
  • ​inter_block_lgkm_drain (Bool): Emit wait_lgkm(0) at non-top, non-cross interior block starts.
  • ​partial_prologue_drain (Bool): Skip wait_vm(0) drains in the framework prologue.
  • ​global_before_frag (Bool): Emit globals before frags in each block.
  • ​lgkm_per_load_a (Int): lgkmcnt entries per channel-A frag-load.
  • ​lgkm_per_load_b (Int): lgkmcnt entries per channel-B frag-load.

Returns:

Self: A fully populated ScheduleConfig.