Skip to main content

Mojo function

div_col

div_col(mut dst: TileTensor[DType.float32, address_space=AddressSpace.LOCAL], src: TileTensor[DType.float32, address_space=AddressSpace.LOCAL], vec: TileTensor[DType.float32, address_space=AddressSpace.LOCAL])

dst[gr, gc] = src[gr, gc] / vec[gc] (final o_reg / norm_vec).

Hand-lowered to recip(vec) * src per column. Using / directly would expand to the IEEE-correct fdiv sequence (v_div_scale_f32 + v_rcp_f32 + v_div_fmas_f32 + v_div_fixup_f32).