GradMax¶

GradMax [EMU+22] and SENN [MMKM24] aim to maximise the loss decrease after the next gradient step on the new weights. To achieve this, they maximize the norm of the gradient w.r.t. new weights. GradMax initializes \(\boldsymbol{\Psi}= 0\), ensuring \(\delta_z = 0\), then maximizes the gradient norm w.r.t. \(\boldsymbol{\Psi}\) with the constraint \(\boldsymbol{\Omega}\boldsymbol{\Omega}^\top = I_{C_{\text{ext}}}\):

\[\begin{aligned} \boldsymbol{\Omega}^* = \mathop{\mathrm{\arg\!\max}}_{\left|\left|\boldsymbol{\Omega}\right|\right|_F \leq 1} \left|\left|\nabla_{\boldsymbol{\Psi}} \mathcal{L}(f)\right|\right|_F^2 \end{aligned}\]

References¶

[EMU+22]

Utku Evci, Bart van Merrienboer, Thomas Unterthiner, Fabian Pedregosa, and Max Vladymyrov. GradMax: Growing Neural Networks using Gradient Information. In ICLR. 2022. URL: https://openreview.net/forum?id=qjN4h_wwUO.

[MMKM24]

Rupert Mitchell, Robin Menzenbach, Kristian Kersting, and Martin Mundt. Self-Expanding Neural Networks. 2024. arXiv:2307.04526. URL: http://arxiv.org/abs/2307.04526, doi:10.48550/arXiv.2307.04526.