Mojo function
top_p_sampling_gpu
top_p_sampling_gpu[type: DType, rank: Int, out_idx_type: DType, //, _test_sort: Bool = False](ctx: DeviceContext, top_ps: NDBuffer[type, 1, origin], input_logits: NDBuffer[type, rank, origin], out_token_ids: NDBuffer[out_idx_type, rank, origin], temperature: SIMD[type, 1] = __init__[__mlir_type.!pop.int_literal](1))
GPU implementation of Top-P sampling for token selection. This function applies temperature scaling, softmax, a radix sort, and then samples tokens based on the cumulative probability mass (Top-P).
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!