Mojo function

gather_reduce

gather_reduce[dtype: DType, gather_axis: Int, reduce_axis: Int, simd_width: Int, reduce_fn: fn[dtype: DType, width: Int](SIMD[dtype, width], SIMD[dtype, width]) -> SIMD[dtype, width]](output: LayoutTensor[dtype, layout, origin, address_space=address_space, element_layout=element_layout, layout_int_type=layout_int_type, linear_idx_type=linear_idx_type, masked=masked, alignment=alignment], input: LayoutTensor[dtype, layout, origin, address_space=address_space, element_layout=element_layout, layout_int_type=layout_int_type, linear_idx_type=linear_idx_type, masked=masked, alignment=alignment], indices: LayoutTensor[DType.int32, layout, origin, address_space=address_space, element_layout=element_layout, layout_int_type=layout_int_type, linear_idx_type=linear_idx_type, masked=masked, alignment=alignment], reduce_init: Scalar[dtype])

Computes output[i, j, k] = input[indices[i, j], k] and simultaneously reduces the output across axis 1 to produce output[i, k].

The motivating use-case for this is multi-hot embeddings in recommender models. This provides similar functionality to Torch's EmbeddingBag layer. In that context, i is the batch dimension, j is the multi-hot dimension, and k is the embedding dimension.