Mojo module
toppminp_gpu
comptime valuesβ
DEBUG_FILEβ
comptime DEBUG_FILE = False
SEEDβ
comptime SEED = 42
Structsβ
Functionsβ
- β
min_p_sampling_gpu: GPU implementation of Min-P sampling for token selection. This function applies temperature scaling, softmax, a radix sort, and then samples tokens based on the calculated probability threshold (Min-P). - β
normalize: - β
normalize_u32: - β
radix_sort_pairs_kernel: Radix pair sort kernel for (default) descending order. - β
reinterpret: - β
run_radix_sort_pairs_gpu: - β
top_p_sampling_gpu: GPU implementation of Top-P sampling for token selection. This function applies temperature scaling, softmax, a radix sort, and then samples tokens based on the cumulative probability mass (Top-P). - β
topk_wrapper: Copy ofKernels/mojo/nn/topk.mojo:_topk_stage1with the addition of max_vals and p_threshold arguments to determine if sorting is needed for top-p/min-p sampling. - β
topk_wrapper_no_shmem: Shared-memory-free variant of topk_wrapper for Apple GPUs. - β
topp_minp_sampling_kernel: Top P-Min P sampling kernel.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!