IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo struct

OutputPipelineConfig

struct OutputPipelineConfig

Configuration for the MMA-to-Epilogue output pipeline.

Bundles the three parameters that jointly define TMEM accumulator stage management for MMA/epilogue synchronization:

  • num_stages: Number of accumulator pipeline stages (typically 1 or 2).
  • stage_stride_cols: TMEM column stride between accumulator stages.
  • cta_group: CTA group size (1 or 2).

stage_stride_cols computation: Two strategies are used depending on the kernel family:

  • Standard kernels (default, blockwise_fp8): NUM_TMEM_COLS // num_stages (= 512 // stages). Divides all 512 TMEM columns evenly among stages.
  • Block-scaled kernels (block_scaled, grouped, 1d1d variants): MMA_N. Sizes each stage to match the MMA output width, which may be smaller than half of TMEM when MMA_N < 256.

Constructed once per kernel struct and propagated to all pipeline types (OutputTilePipeline, warp contexts, TileWriter, etc.).

Fields​

  • ​num_stages (Int):
  • ​stage_stride_cols (Int):
  • ​cta_group (Int):

Implemented traits​

AnyType, Copyable, Equatable, ImplicitlyCopyable, ImplicitlyDeletable, Movable, RegisterPassable, TrivialRegisterPassable