For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

causal_conv1d_varlen_fwd_gpu

def causal_conv1d_varlen_fwd_gpu[x_dtype: DType, weight_dtype: DType, bias_dtype: DType, output_dtype: DType, cu_seqlens_dtype: DType, cache_indices_dtype: DType, has_initial_state_dtype: DType, conv_states_dtype: DType, WIDTH: Int, BLOCK_DIM: Int, BLOCK_SEQ: Int, x_LT: TensorLayout, weight_LT: TensorLayout, bias_LT: TensorLayout, query_start_loc_LT: TensorLayout, cache_indices_LT: TensorLayout, has_initial_state_LT: TensorLayout, conv_states_LT: TensorLayout, output_LT: TensorLayout](dim: Int, total_seqlen: Int, batch: Int, x: TileTensor[x_dtype, x_LT, MutUntrackedOrigin], weight: TileTensor[weight_dtype, weight_LT, MutUntrackedOrigin], bias: TileTensor[bias_dtype, bias_LT, MutUntrackedOrigin], query_start_loc: TileTensor[cu_seqlens_dtype, query_start_loc_LT, MutUntrackedOrigin], cache_indices: TileTensor[cache_indices_dtype, cache_indices_LT, MutUntrackedOrigin], has_initial_state: TileTensor[has_initial_state_dtype, has_initial_state_LT, MutUntrackedOrigin], conv_states: TileTensor[conv_states_dtype, conv_states_LT, MutUntrackedOrigin], output: TileTensor[output_dtype, output_LT, MutUntrackedOrigin], x_dim_stride: UInt32, x_seqlen_stride: UInt32, weight_dim_stride: UInt32, weight_width_stride: UInt32, out_dim_stride: UInt32, out_seqlen_stride: UInt32, conv_states_batch_stride: UInt32, conv_states_dim_stride: UInt32, conv_states_width_stride: UInt32, silu_activation: Int8, pad_slot_id: Int32, has_cache_indices: Int8, has_initial_state_flag: Int8, has_conv_states: Int8, has_bias: Int8)

GPU kernel for causal conv1d forward with variable length sequences.

Grid: (batch, ceildiv(dim, BLOCK_DIM)) Block: (BLOCK_DIM, BLOCK_SEQ)

Each block processes BLOCK_DIM channels for one sequence.

Note: silu_activation and flag parameters are Int8 (0 or 1) instead of Bool for DevicePassable compatibility on GPU.