Net2Net¶
How can we transfer knowledge from one network to a new, larger network? A network morphism \(A', \theta' = \mathcal{T}(A, \theta)\) is called function-preserving if
Net2Net. In early work, Net2Net [CGS16] proposed a Net2WiderNet operation: increasing the width of a layer by splitting the existing weights of a neuron into two new neurons, with the same input weight as the original, but with half the output weights, to compensate for the duplication. In practice, to break symmetry, a small amount of random noise is added to the new neurons.
Similarly to the Net2WiderNet operation for layer-widening, one can grow a network in depth using an (approximately) function-preserving morphism: insert a layer initialized to represent the identity mapping, known as Net2DeeperNet [CGS16]. In general, it is only applicable to activation functions \(\sigma\) that are idempotent \(\sigma \circ \sigma = \sigma\), such as ReLU activations, although this can be generalised to a wider class of activation functions [WWRC16].
References¶
Tianqi Chen, Ian Goodfellow, and Jonathon Shlens. Net2Net: Accelerating Learning via Knowledge Transfer. In ICLR. 2016. arXiv:1511.05641. URL: http://arxiv.org/abs/1511.05641, doi:10.48550/arXiv.1511.05641.
Tao Wei, Changhu Wang, Yong Rui, and Chang Wen Chen. Network Morphism. In ICML. March 2016. arXiv:1603.01670. URL: http://arxiv.org/abs/1603.01670, doi:10.48550/arXiv.1603.01670.