For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo struct
RowMeanOfSquaresQK
struct RowMeanOfSquaresQK
Fused per-row mean of squares for two operands Q and K.
For q of shape [M, Nq] and k of shape [M, Nk] (sharing rows but with
possibly different column counts), computes out[m, 0] = mean_n(q[m,n]^2)
and out[m, 1] = mean_n(k[m,n]^2) into a [M, 2] output. The square and
accumulation always run in float32. This is a single-launch fusion of two
mo.reduce.row_mean_of_squares ops plus a concat, used for cross-head
QK-RMSNorm statistics under tensor parallelism.
Implemented traitsโ
Methodsโ
executeโ
static def execute[target: StringSlice[StaticConstantOrigin]](output: ManagedTensorSlice[Output, static_spec=output.static_spec], q: ManagedTensorSlice[Input, static_spec=q.static_spec], k: ManagedTensorSlice[Input, static_spec=k.static_spec], ctx: DeviceContext)
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!