For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Python class
LayerNorm
LayerNormβ
class max.experimental.nn.norm.LayerNorm(dim, eps=1e-05, *, keep_dtype=True, elementwise_affine=True, use_bias=True)
Bases: Module
Layer normalization over the last dimension of the input.
Takes an integer dim and always reduces over the last axis. By
default the reduction runs in the input dtype. Pass keep_dtype=False
to upcast to float32 for the reduction and cast back, which trades a
small amount of throughput for numerical stability on float16 or
bfloat16 inputs.
For example:
from max.dtype import DType
from max.experimental.nn.norm import LayerNorm
from max.experimental.realization_context import (
GraphRealizationContext,
realization_context,
)
from max.experimental.tensor import Tensor
from max.graph import DeviceRef, Graph, TensorType
graph = Graph(
"ln",
input_types=[
TensorType(DType.float32, ("batch", "seq", 2048), DeviceRef.GPU()),
],
)
ctx = GraphRealizationContext(graph)
with realization_context(ctx), ctx:
x = Tensor.from_graph_value(graph.inputs[0])
norm = LayerNorm(2048)
y = norm(x)
graph.output(y)-
Parameters:
-
- dim (int) β The size of the last dimension of the input.
- eps (float) β A small positive constant added to the variance for numerical
stability. Defaults to
1e-5. - keep_dtype (bool) β Whether to run the reduction in the input dtype. Pass
Falseto upcast to float32 for the reduction and cast back. Defaults toTrue. - elementwise_affine (bool) β Whether to learn a per-element scale (and
optional bias). When
False, no parameters are created and the normalized output is returned directly. Defaults toTrue. - use_bias (bool) β Whether to learn an additive bias. Only effective when
elementwise_affineisTrue. Defaults toTrue.
biasβ
The learned per-element bias of shape [dim], or None when
elementwise_affine is False or use_bias is False.
forward()β
forward(x)
Returns x normalized over its last dimension.
weightβ
The learned per-element scale of shape [dim], or None when
elementwise_affine is False.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!