Skip to main content

Mojo function

swishGLU

swishGLU[target: StringSlice[StaticConstantOrigin] = "cpu"](a: TileTensor[a.dtype, a.LayoutType, a.origin, linear_idx_type=a.linear_idx_type, element_size=a.element_size], b0: TileTensor[b0.dtype, b0.LayoutType, b0.origin, linear_idx_type=b0.linear_idx_type, element_size=b0.element_size], b1: TileTensor[b1.dtype, b1.LayoutType, b1.origin, linear_idx_type=b1.linear_idx_type, element_size=b1.element_size], c: TileTensor[c.dtype, c.LayoutType, c.origin, linear_idx_type=c.linear_idx_type, element_size=c.element_size], ctx: DeviceContextPtr)

Reference: GLU Variants Improve Transformer by Noam Shazeer https://arxiv.org/pdf/2002.05202v1 The implementation follows cutlass, using one kernel invocation and writing to the destination once.

Was this page helpful?