Skip to main content

Mojo function

fused_token_sampling_cpu

fused_token_sampling_cpu[type: DType, rank: Int, out_idx_type: DType](max_k: Int, input: NDBuffer[type, rank, origin], out_idxs: NDBuffer[out_idx_type, rank, origin], k: OptionalReg[NDBuffer[int64, 1, MutableAnyOrigin]] = OptionalReg[NDBuffer[int64, 1, MutableAnyOrigin]]({:i1 0, 1}), temperature: OptionalReg[NDBuffer[float32, 1, MutableAnyOrigin]] = OptionalReg[NDBuffer[float32, 1, MutableAnyOrigin]]({:i1 0, 1}), top_p: OptionalReg[NDBuffer[float32, 1, MutableAnyOrigin]] = OptionalReg[NDBuffer[float32, 1, MutableAnyOrigin]]({:i1 0, 1}), seed: OptionalReg[NDBuffer[uint64, 1, MutableAnyOrigin]] = OptionalReg[NDBuffer[uint64, 1, MutableAnyOrigin]]({:i1 0, 1}))

Generalized implementation of the Top K algorithm with sampling. Returns the sampled index from the innermost dimension of the input tensor for each row/subvolume.

Parameters:

  • type (DType): Data type of the input buffer.
  • rank (Int): Rank of the input.
  • out_idx_type (DType): Data type of the output indices.

Args:

  • max_k (Int): Largest number of top elements.
  • input (NDBuffer[type, rank, origin]): NDBuffer[type, rank] (Any shape)- The input tensor.
  • out_idxs (NDBuffer[out_idx_type, rank, origin]): NDBuffer[out_idx_type, rank] (shape of [input_shape[:-1]] + [1]) - The output indices.
  • k (OptionalReg[NDBuffer[int64, 1, MutableAnyOrigin]]): Optional device buffer of top elements to keep for each batch element.
  • temperature (OptionalReg[NDBuffer[float32, 1, MutableAnyOrigin]]): The temperature based scaling.
  • top_p (OptionalReg[NDBuffer[float32, 1, MutableAnyOrigin]]): Only use the tokens whose cumulative probability exceeds this threshold.
  • seed (OptionalReg[NDBuffer[uint64, 1, MutableAnyOrigin]]): The seed to use for the random number generator.

Was this page helpful?