Normalization Layers Are All That Sharpness-Aware Minimization Needs

Published in NeurIPS, 2023

Recommended citation: Maximilian Müller, Tiffany Vlaar, David Rolnick, and Matthias Hein (2023) “Normalization Layers Are All That Sharpness-Aware Minimization Needs”, NeurIPS 2023 https://arxiv.org/abs/2306.04226

We show that perturbing only the affine normalization parameters (roughly 0.1% of all parameters) in the adversarial step of SAM typically outperforms perturbing all of the parameters. This finding generalizes to different SAM variants and both BatchNorm and LayerNorm. Alternative sparse perturbation approaches do not achieve similar performance, especially not at such extreme sparsity levels.

Download paper here

Recommended citation: Maximilian Müller, Tiffany Vlaar, David Rolnick, and Matthias Hein (2023) “Normalization Layers Are All That Sharpness-Aware Minimization Needs”, NeurIPS 2023