IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo module

learnable_2d_interp_pos_emb

Learnable 2D interpolated position embedding (Kimi K2.5 MoonViT3d).

Equivalent to Learnable2DInterpPosEmbDivided_fixed.forward() from nvidia/Kimi-K2.5-NVFP4/modeling_kimi_k25.py.

For each video described by grid_thws:

  1. Bicubic-interpolates the learnable 2D weight grid from (H, W) to (h, w). When (h, w) == (H, W) the grid is used directly.
  2. If t > 1 adds a 1D sincos temporal embedding per frame.
  3. Adds the result element-wise to x.

Tensor layout (all row-major):

  • x: (L, dim) patch embeddings
  • weight: (H, W, dim) learnable 2D grid
  • grid_thws: (N, 3) int64 per-video (t, h, w)
  • time_weight: (num_frames, dim) float32 1D sincos temporal embedding
  • output: (L, dim) x + interpolated position embedding

Functions