For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo struct

DeclarativeSchedule

struct DeclarativeSchedule[is_fp8: Bool, lgkm_a: Int, lgkm_b: Int]

Constraint-based pipeline: algorithm declares ops, target supplies costs.

The algorithm specifies WHAT ops exist — just the tag and buffer metadata (stage, subtile, channel, k_offset). No resource kinds, no latencies, no roles.

The TargetProfile specifies HOW the hardware executes them — per-op costs (resource, latency, role) via the cost model, and pipeline structure (depth, MMA grid, buffer strategy) via the pipeline config. One factory call (e.g., mi355x_target()) provides everything.

The framework then:

Annotates logical ops with the cost model (annotate_ops)
Reorders into MMA-block-interleaved execution order (double_buffer_reorder)
Derives optimal scheduling via CSP backtracking (optimal_schedule_with_halves)
Derives wait counts from schedule order (derive_wait_counts)

Usage: comptime schedule = build_scheduleis_fp8, lgkm_a, lgkm_b

Implemented traits

AnyType, ImplicitlyDeletable, PipelineSchedule

Methods

`init`

def __init__(out self, config: ScheduleConfig = ScheduleConfig(scheduling=SchedulingStrategy.CSP, sched_barrier_mask=Int(85), auto_waits=True, drain_lgkm_mask=Int(0), auto_drain=False, lds_contention_penalty=Int(0), wait_lgkm_first=Int(8), wait_vm_last=Int(6), lgkm_per_load_a=Int(0), lgkm_per_load_b=Int(0), lgkm_after_last=False, minimal_barriers=False, omit_mma_set_prio=False, max_vgpr=Int(999999), global_before_frag=False, barrier_before_pre_ops=False, inter_block_lgkm_drain=False, partial_prologue_drain=False, wrap_waits_with_sched_barrier=False), target: TargetProfile = mi355x_target(Int(4), Int(4), Int(1)))

def __init__(out self, config: ScheduleConfig, hw_config: PipelineConfig, cost_model: TargetCostModel)

`config`

def config(self) -> PipelineConfig

Returns:

PipelineConfig

`declare_ops`

def declare_ops(self) -> List[OpDesc]

Algorithm description: WHAT ops exist, with buffer metadata only.

Returns 24 logical ops (2 halves x 12 ops) from the ping-pong op table. No resource kinds, no latencies, no roles — those come from the TargetProfile's cost model. See _logical_half() for the table.

Returns:

List[OpDesc]

`build_body`

def build_body(self) -> List[OpDesc]

Apply cost model to logical ops, then reorder for execution.

declare_ops() → logical ops (no hardware costs)
annotate_ops() → ops with resource/latency/role from cost model
double_buffer_reorder() → MMA-block-interleaved execution order

Returns:

List[OpDesc]

`schedule_config`

def schedule_config(self) -> ScheduleConfig

Return tuning knobs with lgkm counts from type params.

Returns:

ScheduleConfig

Implemented traits​

Methods​

__init__​

config​

declare_ops​

build_body​

schedule_config​

Implemented traits

Methods

`init`

`config`

`declare_ops`

`build_body`

`schedule_config`