Net2Net ======= How can we transfer knowledge from one network to a new, larger network? A network morphism :math:`A', \theta' = \mathcal{T}(A, \theta)` is called *function-preserving* if .. math:: \begin{aligned} \forall x \in \mathcal{D}, \quad f_{A', \theta'}(x) = f_{A, \theta}(x) \end{aligned} **Net2Net.** In early work, Net2Net :cite:p:`chen_net2net_2016` proposed a *Net2WiderNet* operation: increasing the width of a layer by splitting the existing weights of a neuron into two new neurons, with the same input weight as the original, but with half the output weights, to compensate for the duplication. In practice, to break symmetry, a small amount of random noise is added to the new neurons. Similarly to the *Net2WiderNet* operation for layer-widening, one can grow a network in depth using an (approximately) function-preserving morphism: insert a layer initialized to represent the identity mapping, known as *Net2DeeperNet* :cite:p:`chen_net2net_2016`. In general, it is only applicable to activation functions :math:`\sigma` that are idempotent :math:`\sigma \circ \sigma = \sigma`, such as ReLU activations, although this can be generalised to a wider class of activation functions :cite:p:`wei_network_2016`.