For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo module
causal_conv1d
Core Causal Conv1D Kernel Implementations.
This module provides CPU and GPU kernel implementations for causal 1D convolution, supporting both channel-first and channel-last memory layouts.
Causal Convolution:
In causal convolution, the output at position i depends only on inputs at
positions [i - width + 1, i]. This ensures no information leakage from
future positions, making it suitable for autoregressive sequence modeling.
Mathematical form for width=4:
out[i] = sum(x[i-3:i+1] * w[0:4]) + bias (with boundary handling)Kernel Categories:
1. Forward Kernels (CPU & GPU):
- `causal_conv1d_channel_first_fwd_cpu[_no_bias]`
- `causal_conv1d_channel_last_fwd_cpu[_with_seq_idx][_no_bias]`
- `causal_conv1d_channel_first_fwd_gpu[_with_seq_idx][_no_bias]`
- `causal_conv1d_channel_last_fwd_gpu[_with_seq_idx][_no_bias]`
SIMD-vectorized implementations with compile-time width specialization.
Supported widths: 1, 2, 3, 4.
2. Update Kernels (for autoregressive decode):
- `causal_conv1d_update_cpu[_no_bias]`
- `causal_conv1d_update_gpu[_no_bias]`
Incremental update operations that maintain conv state for efficient
autoregressive token generation.Memory Layouts: - Channel-first (B, C, L): Standard layout, contiguous channels per position. - Channel-last (B, L, C): Contiguous positions per channel, used by some frameworks.
GPU Optimization Parameters: - kNThreads=128: Threads per block for sequence processing - kNElts=4: Elements processed per thread for better ILP - SIMD width 4: Vectorized weight loading for width=4 kernels
Activation Support: - None: Direct convolution output - SiLU: Sigmoid Linear Unit activation (x * sigmoid(x))
Functionsβ
- β
causal_conv1d_channel_first_fwd_cpu: CPU implementation of causal conv1d for channel-first layout with bias. - β
causal_conv1d_channel_first_fwd_cpu_no_bias: CPU implementation of causal conv1d for channel-first layout without bias. - β
causal_conv1d_channel_first_fwd_gpu: Optimized GPU implementation of causal conv1d for channel-first layout with bias. - β
causal_conv1d_channel_first_fwd_gpu_no_bias: Optimized causal conv1d implementation for channel first data layout using SIMD operations (no bias). - β
causal_conv1d_channel_first_fwd_gpu_no_bias_with_seq_idx: Optimized causal conv1d implementation for channel-first data layout using SIMD operations (no bias) with seq_idx support. - β
causal_conv1d_channel_first_fwd_gpu_with_seq_idx: Optimized causal conv1d implementation for channel-first data layout using SIMD operations with seq_idx support. - β
causal_conv1d_channel_last_fwd_cpu: Optimized CPU implementation of causal conv1d for channel-last layout with bias. - β
causal_conv1d_channel_last_fwd_cpu_no_bias: Optimized CPU implementation of causal conv1d for channel-last layout without bias. - β
causal_conv1d_channel_last_fwd_cpu_no_bias_with_seq_idx: Optimized implementation of causal conv1d for channel last data layout without bias but with seq_idx. - β
causal_conv1d_channel_last_fwd_cpu_with_seq_idx: Optimized implementation of causal conv1d for channel last data layout with seq_idx. - β
causal_conv1d_channel_last_fwd_gpu: Optimized causal conv1d implementation for channel last data layout using SIMD operations. - β
causal_conv1d_channel_last_fwd_gpu_no_bias: Optimized causal conv1d implementation for channel last data layout using SIMD operations (no bias). - β
causal_conv1d_channel_last_fwd_gpu_no_bias_with_seq_idx: Optimized causal conv1d implementation for channel last data layout using SIMD operations (no bias) with seq_idx support. - β
causal_conv1d_channel_last_fwd_gpu_with_seq_idx: Optimized causal conv1d implementation for channel last data layout using SIMD operations with seq_idx support. - β
causal_conv1d_update_cpu: CPU implementation of causal conv1d update for incremental inference. - β
causal_conv1d_update_cpu_no_bias: CPU implementation of causal conv1d update without bias. - β
causal_conv1d_update_gpu: GPU kernel for causal conv1d update operation (for autoregressive decode). - β
causal_conv1d_update_gpu_no_bias: GPU kernel for causal conv1d update operation without bias (for autoregressive decode). - β
silu: Sigmoid Linear Unit (SiLU) activation function.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!