IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

topk_wrapper_no_shmem

def topk_wrapper_no_shmem[input_type: DType, index_type: DType, *, is_top_p: Bool, block_size: Int, largest: Bool = True, _test_sort: Bool = False](K: Int, num_elements: Int, num_blocks_per_input: Int, in_buffer: UnsafePointer[Scalar[input_type], ImmutUntrackedOrigin], local_topk_vals: UnsafePointer[Scalar[input_type], MutUntrackedOrigin], local_topk_idxs: UnsafePointer[Scalar[index_type], MutUntrackedOrigin], p_threshold: UnsafePointer[Scalar[input_type], MutUntrackedOrigin], skip_sort: UnsafePointer[Scalar[DType.bool], MutUntrackedOrigin])

Shared-memory-free variant of topk_wrapper for Apple GPUs.

Uses warp-level reduction and register-based invalidation instead of shared memory. Only correct when block_size <= WARP_SIZE.