GradMax
=======

**[[GradMax]]** :cite:p:`evci_gradmax_2022` and
[[SENN]] :cite:p:`mitchell_self-expanding_2024` aim to maximise
the loss decrease after the next gradient step on the new weights. To
achieve this, they maximize the norm of the gradient w.r.t. new weights.
[[GradMax]] initializes :math:`\boldsymbol{\Psi}= 0`, ensuring
:math:`\delta_z = 0`, then maximizes the gradient norm
w.r.t. :math:`\boldsymbol{\Psi}` with the constraint
:math:`\boldsymbol{\Omega}\boldsymbol{\Omega}^\top = I_{C_{\text{ext}}}`:

.. math::

   \begin{aligned}
   \boldsymbol{\Omega}^* = \mathop{\mathrm{\arg\!\max}}_{\left|\left|\boldsymbol{\Omega}\right|\right|_F \leq 1} \left|\left|\nabla_{\boldsymbol{\Psi}} \mathcal{L}(f)\right|\right|_F^2
   \end{aligned}