Mojo function
top_p_sampling
top_p_sampling[type: DType, out_idx_type: DType, //, _test_sort: Bool = False](top_ps: LayoutTensor[type, layout, origin, address_space=address_space, element_layout=element_layout, layout_int_type=layout_int_type, linear_idx_type=linear_idx_type, masked=masked, alignment=alignment], input_logits: LayoutTensor[type, layout, origin, address_space=address_space, element_layout=element_layout, layout_int_type=layout_int_type, linear_idx_type=linear_idx_type, masked=masked, alignment=alignment], out_token_ids: LayoutTensor[out_idx_type, layout, origin, address_space=address_space, element_layout=element_layout, layout_int_type=layout_int_type, linear_idx_type=linear_idx_type, masked=masked, alignment=alignment], temperature: SIMD[type, 1] = __init__[__mlir_type.!pop.int_literal](1))
Naive CPU implementation of Top-P sampling for token selection. This function applies temperature scaling, softmax, a merge sort, and then samples tokens based on the cumulative probability mass (Top-P).
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!