Mojo function
top_p_sampling_gpu
top_p_sampling_gpu[dtype: DType, out_idx_type: DType, //, _test_sort: Bool = False](ctx: DeviceContext, top_ps: TileTensor[dtype, top_ps.LayoutType, top_ps.origin, address_space=top_ps.address_space, linear_idx_type=top_ps.linear_idx_type, element_size=top_ps.element_size], input_logits: TileTensor[dtype, input_logits.LayoutType, input_logits.origin, linear_idx_type=input_logits.linear_idx_type, element_size=input_logits.element_size], out_token_ids: TileTensor[out_idx_type, out_token_ids.LayoutType, out_token_ids.origin, linear_idx_type=out_token_ids.linear_idx_type, element_size=out_token_ids.element_size], temperature: Scalar[dtype] = 1)
GPU implementation of Top-P sampling for token selection. This function applies temperature scaling, softmax, a radix sort, and then samples tokens based on the cumulative probability mass (Top-P).
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!