IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

apply_packed_bitmask

def apply_packed_bitmask[dtype: DType, //, target: StringSlice[StaticConstantOrigin]](output: TileTensor[dtype, Storage=output.Storage, address_space=output.address_space, linear_idx_type=output.linear_idx_type, element_size=output.element_size], logits: TileTensor[dtype, Storage=logits.Storage, address_space=logits.address_space, linear_idx_type=logits.linear_idx_type, element_size=logits.element_size], packed: TileTensor[DType.int32, Storage=packed.Storage, address_space=packed.address_space, linear_idx_type=packed.linear_idx_type, element_size=packed.element_size], fill_value: Scalar[dtype], ctx: DeviceContext)

Apply a packed-int32 grammar bitmask to logits in a single fused pass.

Unpacks a packed bitmask (1 bit per token, 32 tokens per int32 word) and masks logits with it without ever materializing a bool tensor: for each (b, v), the token is kept when bit v % 32 of word packed[b, v // 32] is set, otherwise output[b, v] is set to fill_value (the masked-out sentinel, e.g. a large negative number). This replaces a CPU unpack + ops.where in constrained decoding.

Args: