Skip to main content

Mojo module

learnable_2d_interp_pos_emb

Learnable 2D interpolated position embedding (Kimi K2.5 MoonViT3d).

Equivalent to Learnable2DInterpPosEmbDivided_fixed.forward() from nvidia/Kimi-K2.5-NVFP4/modeling_kimi_k25.py.

For each video described by grid_thws:

  1. Bicubic-interpolates the learnable 2D weight grid from (H, W) to (h, w). When (h, w) == (H, W) the grid is used directly.
  2. If t > 1 adds a 1D sincos temporal embedding per frame.
  3. Adds the result element-wise to x.

Tensor layout (all row-major):

  • x: (L, dim) patch embeddings
  • weight: (H, W, dim) learnable 2D grid
  • grid_thws: (N, 3) int64 per-video (t, h, w)
  • time_weight: (num_frames, dim) float32 1D sincos temporal embedding
  • output: (L, dim) x + interpolated position embedding

Functions

Was this page helpful?