Normalization Layers Are All That Sharpness-Aware Minimization Needs
Published in NeurIPS, 2023
Recommended citation: Maximilian Müller, Tiffany Vlaar, David Rolnick, and Matthias Hein (2023) “Normalization Layers Are All That Sharpness-Aware Minimization Needs”, NeurIPS 2023 https://arxiv.org/abs/2306.04226
We show that perturbing only the affine normalization parameters (roughly 0.1% of all parameters) in the adversarial step of SAM typically outperforms perturbing all of the parameters. This finding generalizes to different SAM variants and both BatchNorm and LayerNorm. Alternative sparse perturbation approaches do not achieve similar performance, especially not at such extreme sparsity levels.
Recommended citation: Maximilian Müller, Tiffany Vlaar, David Rolnick, and Matthias Hein (2023) “Normalization Layers Are All That Sharpness-Aware Minimization Needs”, NeurIPS 2023