Metadata-Version: 2.4
Name: masked_norm
Version: 2025.7
Author: algmarques
Maintainer: algmarques
License-Expression: MIT
Project-URL: Repository, https://github.com/algmarques/masked_norm.git
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=2.7.1
Requires-Dist: numpy>=2.3.1
Dynamic: license-file


# Masked Norm

Self-attention-based neural network architectures pad input sequences to an
expected length prior to processing. Generally, an attention mask is generated
for each sequence in order to retain its original length and further instruct
the model on which sequence instances are relevant.

Normalization transformations suppress the internal covariance shift problem
in deep neural networks, leading to a reduction training time. Layer
normalization - popular in self-attention-based architectures - normalizes
each instance of the input sequences. Notably, the output of this
transformation is invariant under a re-scaling operation of each sequence
instance.

When input sequences are padded with a constant value, the variances of the
padded instances are null. Standard layer normalization implementations
introduce a stabilising constant $\epsilon$ to prevent arithmetic errors.
This can lead to overflows further along the computational graph.

The masked normalization transformation uses the input's attention mask such
that only the relevant instances of the sequence are normalized. Moreover, by
applying simple shape-shifting operations (such as axis permutation or
unfoldings) over the input tensor, masked normalization can reproduce batch,
group, layer, and instance normalization.

This masked normalization implementation refrains from tracking the running
statistics during training, since this breaks re-scaling invariance. Also, the
proposed implementation employs the variance's unbiased estimator, as each
scalar tensor element is considered a true sample.

This repository contains a PyTorch implementation of masked normalization.

To use it in your project, install it via `pip`:

```bash
pip install masked_norm
```

Two user-facing implementations are presented: a plain masked normalization
`masked_norm`, and a masked normalization followed by an affine
transformation `LazyAffineMaskedNorm`.

You can use the functional form of the masked normalization transformation
directy in your model's forward call:

```python
from torch import tensor, rand
from masked_norm import masked_norm

inpt = rand(2, 2, 3)
mask = tensor(
    [
        [True, True],
        [True, False]
    ]
)

masked_norm(inpt, mask)
```

Our use its class equivalent:

```python
from torch import tensor, rand
from masked_norm import MaskedNorm

inpt = rand(2, 2, 3)
mask = tensor(
    [
        [True, True],
        [True, False]
    ]
)

norm_layer = MaskedNorm()

norm_layer(inpt, mask)
```

You can apply the affine masked normalization transformation, by instantiating
a lazy layer:

```python
from torch import tensor, rand
from masked_norm import LazyAffineMaskedNorm

inpt = rand(2, 2, 3)
mask = tensor(
    [
        [True, True],
        [True, False]
    ]
)

affine_norm_layer = LazyAffineMaskedNorm()

affine_norm_layer(inpt, mask)
```

To see how you can reproduce the batch, group, layer, or instance
normalization procedures with masked normalization take a look at the `test`
subdirectory.

If, either by chance or design, a collection of samples is constant, the
proposed `masked_norm` and `affine_masked_norm` implementations ignore
this collection, and pass the values along unaltered.

To run the test suite:

```bash
git clone https://github.com/algmarques/masked_norm.git
cd masked_norm
python -m venv .venv
.venv/bin/pip install .
.venv/bin/python -Bm unittest
```
