Mojo function
div_col
div_col(mut dst: TileTensor[DType.float32, address_space=AddressSpace.LOCAL], src: TileTensor[DType.float32, address_space=AddressSpace.LOCAL], vec: TileTensor[DType.float32, address_space=AddressSpace.LOCAL])
dst[gr, gc] = src[gr, gc] / vec[gc] (final o_reg / norm_vec).
Hand-lowered to recip(vec) * src per column. Using / directly
would expand to the IEEE-correct fdiv sequence (v_div_scale_f32 +
v_rcp_f32 + v_div_fmas_f32 + v_div_fixup_f32).
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!