For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

causal_conv1d_varlen_update_gpu

def causal_conv1d_varlen_update_gpu[x_dtype: DType, weight_dtype: DType, bias_dtype: DType, output_dtype: DType, conv_state_dtype: DType, cache_seqlens_dtype: DType, conv_state_indices_dtype: DType, WIDTH: Int, BLOCK_DIM: Int, x_LT: TensorLayout, weight_LT: TensorLayout, bias_LT: TensorLayout, conv_state_LT: TensorLayout, cache_seqlens_LT: TensorLayout, conv_state_indices_LT: TensorLayout, output_LT: TensorLayout](batch: Int, dim: Int, seqlen: Int, state_len: Int, x: TileTensor[x_dtype, x_LT, MutUntrackedOrigin], weight: TileTensor[weight_dtype, weight_LT, MutUntrackedOrigin], bias: TileTensor[bias_dtype, bias_LT, MutUntrackedOrigin], conv_state: TileTensor[conv_state_dtype, conv_state_LT, MutUntrackedOrigin], cache_seqlens: TileTensor[cache_seqlens_dtype, cache_seqlens_LT, MutUntrackedOrigin], conv_state_indices: TileTensor[conv_state_indices_dtype, conv_state_indices_LT, MutUntrackedOrigin], output: TileTensor[output_dtype, output_LT, MutUntrackedOrigin], x_batch_stride: UInt32, x_dim_stride: UInt32, x_seqlen_stride: UInt32, weight_dim_stride: UInt32, weight_width_stride: UInt32, conv_state_batch_stride: UInt32, conv_state_dim_stride: UInt32, conv_state_seqlen_stride: UInt32, out_batch_stride: UInt32, out_dim_stride: UInt32, out_seqlen_stride: UInt32, silu_activation: Int8, pad_slot_id: Int32, has_conv_state_indices: Int8, has_cache_seqlens: Int8, has_bias: Int8)

GPU kernel for causal conv1d update (decode step).

Grid: (batch, ceildiv(dim, BLOCK_DIM)) Block: (BLOCK_DIM,)

Note: silu_activation and flag parameters are Int8 (0 or 1) instead of Bool for DevicePassable compatibility on GPU.