Mojo function

apply_penalties_to_logits

apply_penalties_to_logits[logit_type: DType, penalty_type: DType, //, target: StringSlice[StaticConstantOrigin]](logits: LayoutTensor[logit_type, layout, origin, address_space=address_space, element_layout=element_layout, layout_int_type=layout_int_type, linear_idx_type=linear_idx_type, masked=masked, alignment=alignment], compressed_frequency_data: LayoutTensor[DType.int32, layout, origin, address_space=address_space, element_layout=element_layout, layout_int_type=layout_int_type, linear_idx_type=linear_idx_type, masked=masked, alignment=alignment], frequency_offsets: LayoutTensor[DType.uint32, layout, origin, address_space=address_space, element_layout=element_layout, layout_int_type=layout_int_type, linear_idx_type=linear_idx_type, masked=masked, alignment=alignment], frequency_penalty: LayoutTensor[penalty_type, layout, origin, address_space=address_space, element_layout=element_layout, layout_int_type=layout_int_type, linear_idx_type=linear_idx_type, masked=masked, alignment=alignment], presence_penalty: LayoutTensor[penalty_type, layout, origin, address_space=address_space, element_layout=element_layout, layout_int_type=layout_int_type, linear_idx_type=linear_idx_type, masked=masked, alignment=alignment], repetition_penalty: LayoutTensor[penalty_type, layout, origin, address_space=address_space, element_layout=element_layout, layout_int_type=layout_int_type, linear_idx_type=linear_idx_type, masked=masked, alignment=alignment], ctx: DeviceContextPtr)

Apply penalties to the logits based on the frequency of the tokens in the batch.

The frequency data is stored in a CSR format, where the frequency_offsets is the starting index of each sequence in the frequency_data array. The frequency_data array is a 2D array, where:

frequency_data[i, 0] is the token id
frequency_data[i, 1] is the frequency of the token in the sequence