Non-stationary data distributions¶
A natural application of neuron addition is to non-stationary data distributions, such as Continual Learning. Instead of a single task, represented by its dataset \(\mathcal{D}\), we have a sequence of tasks \(\mathcal{D}_1, \dots, \mathcal{D}_T\), and want a model that performs well on all tasks \(\{\mathcal{D}_1, \dots, \mathcal{D}_T\}\). In addition to the objective of maximising performance on the current task \(\mathcal{D}_i\), the goal is also to prevent relapse on previous tasks, known as catastrophic forgetting [LAM+21].
Continual Learning. By aligning the architectures and weights \((A_t, \theta_t)\), with the current task \(\mathcal{D}_t\), growing allows one to reuse existing weights and only add parameters for the next task \(\mathcal{D}_{t+1}\) when necessary. Some methods enforce sparsity in the weights [YLZF21, YYLH18] or temporary pruning [HTW+19, WLSL20] — disabling the weights to be able to adapt them when training the next task without introducing semantic drift. This results in supernets where each task is using a subset of the weights. Based on ablation studies [YYLH18], growth avoids catastrophic forgetting and adds flexibility to the parameters while maintaining smaller models. The Learn-to-Grow framework [LZW+19] uses NAS to bridge the gap between performance and accuracy by reusing existing weights as much as possible and replacing them when necessary.
Reinforcement Learning. Loss of plasticity is also a significant problem in Reinforcement Learning: the policy quickly overfits to initial observations and fails to adapt to new data as training progresses. Proposed solutions include periodically resetting the neural network [NSDOro+22] or using various forms of regularization [SDDA25]. Most recently, dynamic growth methods have been explored using sparse grow-prune methods to maintain plasticity over the course of training [LOCCP25].
References¶
Steven C. Y. Hung, Cheng-Hao Tu, Cheng-En Wu, Chien-Hung Chen, Yi-Ming Chan, and Chu-Song Chen. Compacting, picking and growing for unforgetting continual learning. In NeurIPS. 2019.
Matthias De Lange, Rahaf Aljundi, Marc Masana, Sarah Parisot, Xu Jia, Ales Leonardis, Gregory Slabaugh, and Tinne Tuytelaars. A continual learning survey: Defying forgetting in classification tasks. IEEE TPAMI, 2021. arXiv:1909.08383. URL: http://arxiv.org/abs/1909.08383, doi:10.1109/TPAMI.2021.3057446.
Xilai Li, Yingbo Zhou, Tianfu Wu, Richard Socher, and Caiming Xiong. Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting. In ICML. 2019. arXiv:1904.00310. URL: http://arxiv.org/abs/1904.00310.
Jiashun Liu, Johan Obando-Ceron, Aaron Courville, and Ling Pan. Neuroplastic Expansion in Deep Reinforcement Learning. June 2025. arXiv:2410.07994. URL: http://arxiv.org/abs/2410.07994, doi:10.48550/arXiv.2410.07994.
Evgenii Nikishin, Max Schwarzer, Pierluca D'Oro, Pierre-Luc Bacon, and Aaron Courville. The Primacy Bias in Deep Reinforcement Learning. May 2022. arXiv:2205.07802. URL: http://arxiv.org/abs/2205.07802, doi:10.48550/arXiv.2205.07802.
Alena Shilova, Alex Davey, Brahim Driss, and Riad Akrour. Staq it! growing neural networks for policy mirror descent. arXiv preprint arXiv:2506.13862, 2025.
Lemeng Wu, Bo Liu, Peter Stone, and Qiang Liu. Firefly Neural Architecture Descent: a General Approach for Growing Neural Networks. In NeurIPS. 2020. URL: https://proceedings.neurips.cc/paper_files/paper/2020/hash/fdbe012e2e11314b96402b32c0df26b7-Abstract.html.
Li Yang, Sen Lin, Junshan Zhang, and Deliang Fan. GROWN: GRow Only When Necessary for Continual Learning. 2021. arXiv:2110.00908. URL: http://arxiv.org/abs/2110.00908, doi:10.48550/arXiv.2110.00908.
Jaehong Yoon, Eunho Yang, Jeongtae Lee, and Sung Ju Hwang. Lifelong Learning with Dynamically Expandable Networks. In ICLR. 2018. arXiv:1708.01547. URL: http://arxiv.org/abs/1708.01547.