Skip to main content

Mojo function

fused_token_sampling_cpu

fused_token_sampling_cpu[dtype: DType, out_idx_type: DType, KLayoutType: TensorLayout = Layout[RuntimeInt[DType.int64], ComptimeInt[1]], TemperatureLayoutType: TensorLayout = Layout[RuntimeInt[DType.int64], ComptimeInt[1]], TopPLayoutType: TensorLayout = Layout[RuntimeInt[DType.int64], ComptimeInt[1]], SeedLayoutType: TensorLayout = Layout[RuntimeInt[DType.int64], ComptimeInt[1]]](max_k: Int, input: TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_shape_types=element_shape_types], out_idxs: TileTensor[out_idx_type, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_shape_types=element_shape_types], k: Optional[TileTensor[DType.int64, KLayoutType, ImmutAnyOrigin]] = None, temperature: Optional[TileTensor[DType.float32, TemperatureLayoutType, ImmutAnyOrigin]] = None, top_p: Optional[TileTensor[DType.float32, TopPLayoutType, ImmutAnyOrigin]] = None, seed: Optional[TileTensor[DType.uint64, SeedLayoutType, ImmutAnyOrigin]] = None)

Generalized implementation of the Top K algorithm with sampling. Returns the sampled index from the innermost dimension of the input tensor for each row/subvolume.

Parameters:

  • dtype (DType): Data type of the input buffer.
  • out_idx_type (DType): Data type of the output indices.
  • KLayoutType (TensorLayout): Layout type of the k buffer.
  • TemperatureLayoutType (TensorLayout): Layout type of the temperature buffer.
  • TopPLayoutType (TensorLayout): Layout type of the top_p buffer.
  • SeedLayoutType (TensorLayout): Layout type of the seed buffer.

Args:

  • max_k (Int): Largest number of top elements.
  • input (TileTensor): NDBuffer[dtype, rank] (Any shape)- The input tensor.
  • out_idxs (TileTensor): NDBuffer[out_idx_type, rank] (shape of [input_shape[:-1]] + [1]) - The output indices.
  • k (Optional): Optional device buffer of top elements to keep for each batch element.
  • temperature (Optional): The temperature based scaling.
  • top_p (Optional): Only use the tokens whose cumulative probability exceeds this threshold.
  • seed (Optional): The seed to use for the random number generator.

Was this page helpful?