Tiny¶

TINY [VRCC24] seeks to find new neurons whose contribution \(\delta_z\) most directly reduces the loss. Using a first-order Taylor expansion:

\[\begin{aligned} \mathcal{L}(z + \delta_z) = \mathcal{L}(z) + \langle \nabla_z \mathcal{L}, \delta_z \rangle + o(\|\delta_z\|) \end{aligned}\]

TINY aligns \(\delta_z\) with the residual gradient \(\boldsymbol{G}^\perp\) to avoid redundancy with existing neurons. Linearizing \(\sigma\) around \(0\), this becomes a low-rank matrix approximation:

\[\begin{aligned} \boldsymbol{\Psi}^*, \boldsymbol{\Omega}^* = \mathop{\mathrm{\arg\!\min}}_{\boldsymbol{\Psi}, \boldsymbol{\Omega}} \left|\left|\boldsymbol{G}^\perp- \boldsymbol{H}^{(l-2)} \boldsymbol{\Psi}^\top \boldsymbol{\Omega}^\top\right|\right|_F^2 \end{aligned}\]

solved in closed form using two SVDs.

References¶

[VRCC24]

Manon Verbockhaven, Théo Rudkiewicz, Sylvain Chevallier, and Guillaume Charpiat. Growing Tiny Networks: Spotting Expressivity Bottlenecks and Fixing Them Optimally. TMLR, July 2024. URL: ².