Mojo function
convert_e4m3fn_to_e4m3fnuz
convert_e4m3fn_to_e4m3fnuz(input_buffer: LayoutTensor[DType.float8_e4m3fn, layout, origin, address_space=address_space, element_layout=element_layout, layout_int_type=layout_int_type, linear_idx_type=linear_idx_type, masked=masked, alignment=alignment], output_buffer: LayoutTensor[DType.float8_e4m3fnuz, layout, origin, address_space=address_space, element_layout=element_layout, layout_int_type=layout_int_type, linear_idx_type=linear_idx_type, masked=masked, alignment=alignment], context: DeviceContext)
Convert E4M3FN weights to E4M3FNUZ format for AMD GPU compatibility.
This conversion handles the key differences between E4M3FN and E4M3FNUZ:
- The bit pattern 10000000 (-128) represents zero in E4M3FN but NaN in E4M3FNUZ
Args:
- โinput_buffer (
LayoutTensor): Input tensor in E4M3FN format. - โoutput_buffer (
LayoutTensor): Output tensor to store E4M3FNUZ format. - โcontext (
DeviceContext): Device context for kernel execution.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!