IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

block_scaled_matmul_with_epilogue

def block_scaled_matmul_with_epilogue[c_type: DType, a_type: DType, b_type: DType, scales_dtype: DType, //, *, SF_VECTOR_SIZE: Int, transpose_b: Bool = True, elementwise_lambda_fn: Optional[def[dtype: DType, width: SIMDSize, *, alignment: Int = Int(1)](IndexList[Int(2)], SIMD[dtype, width]) capturing -> None] = None](c: TileTensor[c_type, Storage=c.Storage, address_space=c.address_space, linear_idx_type=c.linear_idx_type, element_size=c.element_size], a: TileTensor[a_type, Storage=a.Storage, address_space=a.address_space, linear_idx_type=a.linear_idx_type, element_size=a.element_size], b: TileTensor[b_type, Storage=b.Storage, address_space=b.address_space, linear_idx_type=b.linear_idx_type, element_size=b.element_size], a_scales: TileTensor[scales_dtype, Storage=a_scales.Storage, address_space=a_scales.address_space, linear_idx_type=a_scales.linear_idx_type, element_size=a_scales.element_size], b_scales: TileTensor[scales_dtype, Storage=b_scales.Storage, address_space=b_scales.address_space, linear_idx_type=b_scales.linear_idx_type, element_size=b_scales.element_size], tensor_sf: Float32, ctx: DeviceContext)

Our sm100 block scaled matmul kernel still does not support fusion of elementwise operations. This is a temporary implementation that uses our sm100 block scaled matmul kernel and dispatch a separate epilogue kernel to apply the elementwise operations. Callers must allocate c; when an elementwise_lambda_fn is supplied the matmul result is written into c and then read back by the lambda.