Sparse growth and grow-prune methods

One of the challenges in growing neural networks is that we have to predict how our objective function behaves within the neighbourhood \(\mathcal{N}(f_{A_t, \theta_t})\) of our existing model \((A_t, \theta_t)\), typically done using first-order information. Combining growing with pruning allows one to explore the architectural neighbourhood directly, using posterior information that is otherwise hard to predict, before pruning it to the desired size.

Sparse masks are frequently employed as a way to select important neurons and prune the rest. However, as discussed in [DYJ22], in order for a model to withstand multiple reduction and growth steps, the performance hit it takes during pruning should be completely recoverable, if not surpassable, by the growth process. The objective of sparse growth is not to reduce the computational cost of inference but rather to increase the network’s capabilities while avoiding over-parameterization.

To perform incremental learning, [DYJ22], use a gradient-based growth where the gradient of all masked connections is averaged over an epoch, and if it surpasses a specific percentile, they are re-activated. Similarly, they prune or deactivate connections when their weight magnitude is below a specific percentile. This two-step process aims to support long-term learning and outperforms simply training from scratch when new data arrive in both error rate and model size.

MorphNet [GEN+18] uses a sparsity regularizer to penalize over-parameterization while training and then uniformly expands all layers by scaling their width up to a budget. The sparse training maintains good performance and even improves over simple uniform growth under the same FLOPs, showing the benefit of reduction.

In a similar sparse growth manner [YSM21] use masking to start from a very sparse seed architecture and utilise budget-driven sparsity regularization to reduce sparsity, thus growing the network progressively. The method achieves higher accuracy than AutoGrow with smaller models and sparse channels.

CompNet [LMF18] separately trains and imposes an independently interpretable lasso regularization on the new neurons while optimizing for function-preservation. This sparsity optimization can either be applied to the input or output neurons of the new layer that is inserted in the network.

References

[DYJ22] (1,2)

Xiaoliang Dai, Hongxu Yin, and Niraj K. Jha. Incremental learning using a grow-and-prune paradigm with efficient neural networks. IEEE Transactions on Emerging Topics in Computing, 10(2):752–762, 2022. doi:10.1109/TETC.2020.3037052.

[GEN+18]

Ariel Gordon, Elad Eban, Ofir Nachum, Bo Chen, Hao Wu, Tien-Ju Yang, and Edward Choi. Morphnet: fast & simple resource-constrained structure learning of deep networks. In CVPR, 1586–1595. 2018. doi:10.1109/CVPR.2018.00171.

[LMF18]

Jun Lu, Wei Ma, and Boi Faltings. Compnet: neural networks growing via the compact network morphism. ArXiv, 2018. URL: https://api.semanticscholar.org/CorpusID:21127949.

[YSM21]

Xin Yuan, Pedro Savarese, and Michael Maire. Growing Efficient Deep Networks by Structured Continuous Sparsification. In ICLR. 2021. arXiv:2007.15353. URL: http://arxiv.org/abs/2007.15353, doi:10.48550/arXiv.2007.15353.