Mojo struct
TileScheduler
struct TileScheduler[num_stages: Int, cluster_shape: IndexList[3, element_type=DType.uint32] = Index[Int, Int, Int, dtype=DType.uint32](1, 1, 1), rasterize_order: RasterOrder = RasterOrder.AlongM, block_swizzle_size: Int = 8]
Fields
- cluster_dim (
StaticTuple[Int32, 3]): - log_cluster_dim_m (
FastDiv[DType.uint32]): - log_cluster_dim_n (
FastDiv[DType.uint32]): - log_cluster_dim_k (
FastDiv[DType.uint32]): - clc_response (
UnsafePointer[UInt128, MutAnyOrigin, address_space=AddressSpace.SHARED]): - full_mbar (
UnsafePointer[SharedMemBarrier, MutAnyOrigin, address_space=AddressSpace.SHARED]): - empty_mbar (
UnsafePointer[SharedMemBarrier, MutAnyOrigin, address_space=AddressSpace.SHARED]): - throttle_pipeline (
TileScheduler[num_stages, cluster_shape, rasterize_order, block_swizzle_size].ThrottlePipeline):
Implemented traits
AnyType,
Copyable,
ImplicitlyCopyable,
ImplicitlyDestructible,
Movable,
RegisterPassable,
TrivialRegisterPassable
comptime members
ClcBarrierArray
comptime ClcBarrierArray = SMemArray[SharedMemBarrier, num_stages]
ClcResponseArray
comptime ClcResponseArray = SMemArray[UInt128, num_stages]
cluster_size
comptime cluster_size = ((cluster_shape[0] * cluster_shape[1]) * cluster_shape[2])
log_cluster_k
comptime log_cluster_k = FastDiv(cluster_shape[2])
log_cluster_m
comptime log_cluster_m = FastDiv(cluster_shape[0])
log_cluster_n
comptime log_cluster_n = FastDiv(cluster_shape[1])
ThrottleBarrierArray
comptime ThrottleBarrierArray = SMemArray[SharedMemBarrier, (num_stages * 2)]
ThrottlePipeline
comptime ThrottlePipeline = ProducerConsumerPipeline[num_stages]
Methods
__init__
__init__(cluster_dim: StaticTuple[Int32, 3], clc_response: SMemArray[UInt128, num_stages], clc_full: SMemArray[SharedMemBarrier, num_stages], clc_empty: SMemArray[SharedMemBarrier, num_stages], clc_throttle: SMemArray[SharedMemBarrier, (num_stages * 2)]) -> Self
Initialize from typed barrier arrays.
init_throttle_barriers
static init_throttle_barriers(storage_ptr: UnsafePointer[SharedMemBarrier, MutAnyOrigin, address_space=AddressSpace.SHARED], producer_arv_count: Int32, consumer_arv_count: Int32)
Initialize throttle pipeline barriers. Called once by elect_one thread.
work_info_from_clc_response
static work_info_from_clc_response(result: UnsafePointer[UInt128, MutAnyOrigin, address_space=AddressSpace.SHARED]) -> WorkInfo
Returns:
work_info_from_cluster
static work_info_from_cluster(work_info: WorkInfo, cluster_dim: StaticTuple[Int32, 3], log_cluster_dim_m: FastDiv[DType.uint32], log_cluster_dim_n: FastDiv[DType.uint32]) -> WorkInfo
Returns:
initial_work_info
fetch_next_work
fetch_next_work(self, work_info: WorkInfo, consumer_state: PipelineState[num_stages]) -> WorkInfo
Returns:
throttle_signal
throttle_signal(mut self, is_first_cta_in_cluster: Bool)
Signal CLC throttle if this is the first CTA in cluster.
The Load warp acts as producer for CLC throttle, signaling that it has started processing a new work item. This prevents the scheduler from getting too far ahead.
Args:
- is_first_cta_in_cluster (
Bool): Only first CTA signals to avoid duplicates.
wait_and_advance_work
wait_and_advance_work[work_origin: MutOrigin, //](self, ref[num_stages] work_info: WorkInfo, mut consumer_state: PipelineState[num_stages]) -> WaitAndAdvanceContext[work_origin]
Wait for next work from CLC and advance.
Encapsulates the CLC barrier wait (called on scheduler directly).
Usage: with scheduler.wait_and_advance_work(work_info, state) as current: do_mma(current) # After: work_info updated to next value
Returns:
WaitAndAdvanceContext
work_iterator
work_iterator(self) -> WorkIterator[num_stages, cluster_shape, rasterize_order, block_swizzle_size]
Create a per-warp work iterator using next-style iteration.
Each warp should create its own work iterator. The iterator owns work_info and pipeline state internally.
Usage: var work_iter = scheduler.work_iterator() for current in work_iter: scheduler.throttle_signal(ctx.is_first_cta_in_cluster) do_work(current)
Returns:
WorkIterator
scheduler_iterator
scheduler_iterator(self) -> SchedulerWorkIterator[num_stages, cluster_shape, rasterize_order, block_swizzle_size]
Create iterator for Scheduler warp (owns work_info and both pipeline states).
The Scheduler warp uniquely needs to both consume work responses and produce new work requests. This iterator owns everything internally.
Usage: var sched_iter = scheduler.scheduler_iterator() for _ in sched_iter: sched_iter.signal_and_advance() sched_iter.drain()
Returns:
SchedulerWorkIterator
advance_to_next_work
advance_to_next_work(self, mut clc_state: PipelineState[num_stages]) -> PipelineState[num_stages]
Returns:
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!