Mojo struct
GroupedBlockScaledSmem
struct GroupedBlockScaledSmem[a_type: DType, b_type: DType, c_type: DType, sfa_dtype: DType, sfb_dtype: DType, transpose_b: Bool, *, config: BlockScaledMatmulConfig[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b]]
SMEM struct for grouped block-scaled GEMM.
Thin wrapper over BlockScaledTileCore + SmemPipelineBundle + TMA descriptors.
Layout in SMEM:
- Tile storage (via core) — A, B, C, SFA, SFB tiles
- Pipeline barriers
- Tensormap descriptors (5 x 128 bytes = 640 bytes)
Fields
- core (
GroupedBlockScaledSmem[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].Core): - pipelines (
GroupedBlockScaledSmem[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].Pipelines): - tensormap_a (
TMADescriptor): - tensormap_b (
TMADescriptor): - tensormap_sfa (
TMADescriptor): - tensormap_sfb (
TMADescriptor): - tensormap_c (
TMADescriptor):
Implemented traits
AnyType,
ImplicitlyDestructible
comptime members
Core
comptime Core = BlockScaledTileCore[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config]
Pipelines
comptime Pipelines = SmemPipelineBundle[GroupedBlockScaledSmem[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].Core.num_group_pipeline_stages, GroupedBlockScaledSmem[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].Core.num_accum_pipeline_stages, config.num_clc_pipeline_stages, BlockScaledTilePayload[a_type, b_type, sfa_dtype, sfb_dtype, IndexList(BlockScaledTileCore[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].BM, BlockScaledTileCore[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].BK, __list_literal__=Tuple()), IndexList(BlockScaledTileCore[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].BN, BlockScaledTileCore[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].BK, __list_literal__=Tuple()), IndexList(BlockScaledTileCore[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].SFA_DIM0, BlockScaledTileCore[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].SFA_DIM1, __list_literal__=Tuple()), IndexList(BlockScaledTileCore[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].SFB_DIM0, BlockScaledTileCore[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].SFB_DIM1, __list_literal__=Tuple()), BlockScaledTileCore[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].num_pipeline_stages]]
Methods
a_tiles
a_tiles(ref[AddressSpace._value] self) -> GroupedBlockScaledSmem[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].Core.ATileArray
Get A tile array accessor.
Returns:
GroupedBlockScaledSmem
b_tiles
b_tiles(ref[AddressSpace._value] self) -> GroupedBlockScaledSmem[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].Core.BTileArray
Get B tile array accessor.
Returns:
GroupedBlockScaledSmem
c_tiles
c_tiles(ref[AddressSpace._value] self) -> GroupedBlockScaledSmem[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].Core.CTileArray
Get C tile array accessor.
Returns:
GroupedBlockScaledSmem
sfa_tiles
sfa_tiles(ref[AddressSpace._value] self) -> GroupedBlockScaledSmem[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].Core.SFATileArray
Get SFA tile array accessor.
Returns:
GroupedBlockScaledSmem
sfb_tiles
sfb_tiles(ref[AddressSpace._value] self) -> GroupedBlockScaledSmem[a_type, b_type, c_type, sfa_dtype, sfb_dtype, transpose_b, config=config].Core.SFBTileArray
Get SFB tile array accessor.
Returns:
GroupedBlockScaledSmem
tensormap_storage_size
static tensormap_storage_size() -> Int
Size of tensormap storage in bytes (5 x 128 = 640 bytes).
Returns:
total_tile_size
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!