Mojo function
apply_penalties_to_logits
apply_penalties_to_logits[logit_type: DType, penalty_type: DType, //, target: StringSlice[StaticConstantOrigin]](logits: TileTensor[logit_type, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_shape_types=element_shape_types], compressed_frequency_data: TileTensor[DType.int32, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_shape_types=element_shape_types], frequency_offsets: TileTensor[DType.uint32, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_shape_types=element_shape_types], frequency_penalty: TileTensor[penalty_type, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_shape_types=element_shape_types], presence_penalty: TileTensor[penalty_type, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_shape_types=element_shape_types], repetition_penalty: TileTensor[penalty_type, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_shape_types=element_shape_types], ctx: DeviceContextPtr)
Apply penalties to the logits based on the frequency of the tokens in the batch.
The frequency data is stored in a CSR format, where the frequency_offsets is the starting index of each sequence in the frequency_data array. The frequency_data array is a 2D array, where:
- frequency_data[i, 0] is the token id
- frequency_data[i, 1] is the frequency of the token in the sequence
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!