For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

causal_conv1d_channel_first_fwd_cpu

def causal_conv1d_channel_first_fwd_cpu[x_dtype: DType, weight_dtype: DType, output_dtype: DType, bias_dtype: DType](batch: Int, dim: Int, seqlen: Int, width: Int, x: TileTensor[x_dtype, Storage=x.Storage, address_space=x.address_space, linear_idx_type=x.linear_idx_type], weight: TileTensor[weight_dtype, Storage=weight.Storage, address_space=weight.address_space, linear_idx_type=weight.linear_idx_type], output: TileTensor[output_dtype, Storage=output.Storage, address_space=output.address_space, linear_idx_type=output.linear_idx_type], bias: TileTensor[bias_dtype, Storage=bias.Storage, address_space=bias.address_space, linear_idx_type=bias.linear_idx_type], x_batch_stride: UInt32, x_c_stride: UInt32, x_l_stride: UInt32, weight_c_stride: UInt32, weight_width_stride: UInt32, out_batch_stride: UInt32, out_c_stride: UInt32, out_l_stride: UInt32, bias_stride: UInt32, silu_activation: Bool, ctx: Optional[DeviceContext] = None)

CPU implementation of causal conv1d for channel-first layout with bias.

Optimizations:

Parallelization across batch*channel dimensions using sync_parallelize.
Pre-loaded weights in registers to reduce memory access.

Args:

batch (Int): Batch size.
dim (Int): Number of channels.
seqlen (Int): Sequence length.
width (Int): Kernel width.
x (TileTensor[x_dtype, Storage=x.Storage, address_space=x.address_space, linear_idx_type=x.linear_idx_type]): Input tensor of shape (B, C, L).
weight (TileTensor[weight_dtype, Storage=weight.Storage, address_space=weight.address_space, linear_idx_type=weight.linear_idx_type]): Weight tensor of shape (C, W).
output (TileTensor[output_dtype, Storage=output.Storage, address_space=output.address_space, linear_idx_type=output.linear_idx_type]): Output tensor of shape (B, C, L).
bias (TileTensor[bias_dtype, Storage=bias.Storage, address_space=bias.address_space, linear_idx_type=bias.linear_idx_type]): Bias tensor of shape (C,).
x_batch_stride (UInt32): Stride for the batch dimension of the input tensor.
x_c_stride (UInt32): Stride for the channel dimension of the input tensor.
x_l_stride (UInt32): Stride for the sequence length dimension of the input tensor.
weight_c_stride (UInt32): Stride for the channel dimension of the weight tensor.
weight_width_stride (UInt32): Stride for the width dimension of the weight tensor.
out_batch_stride (UInt32): Stride for the batch dimension of the output tensor.
out_c_stride (UInt32): Stride for the channel dimension of the output tensor.
out_l_stride (UInt32): Stride for the sequence length dimension of the output tensor.
bias_stride (UInt32): Stride for the bias tensor.
silu_activation (Bool): Whether to apply SiLU activation.
ctx (Optional[DeviceContext]): The context to execute the work on.