Tiny¶
TINY [VRCC24] seeks to find new neurons whose contribution \(\delta_z\) most directly reduces the loss. Using a first-order Taylor expansion:
\[\begin{aligned}
\mathcal{L}(z + \delta_z) = \mathcal{L}(z) + \langle \nabla_z \mathcal{L}, \delta_z \rangle + o(\|\delta_z\|)
\end{aligned}\]
TINY aligns \(\delta_z\) with the residual gradient \(\boldsymbol{G}^\perp\) to avoid redundancy with existing neurons. Linearizing \(\sigma\) around \(0\), this becomes a low-rank matrix approximation:
\[\begin{aligned}
\boldsymbol{\Psi}^*, \boldsymbol{\Omega}^* = \mathop{\mathrm{\arg\!\min}}_{\boldsymbol{\Psi}, \boldsymbol{\Omega}} \left|\left|\boldsymbol{G}^\perp- \boldsymbol{H}^{(l-2)} \boldsymbol{\Psi}^\top \boldsymbol{\Omega}^\top\right|\right|_F^2
\end{aligned}\]
solved in closed form using two SVDs.