Non-stationary data distributions

A natural application of neuron addition is to non-stationary data distributions, such as Continual Learning. Instead of a single task, represented by its dataset \(\mathcal{D}\), we have a sequence of tasks \(\mathcal{D}_1, \dots, \mathcal{D}_T\), and want a model that performs well on all tasks \(\{\mathcal{D}_1, \dots, \mathcal{D}_T\}\). In addition to the objective of maximising performance on the current task \(\mathcal{D}_i\), the goal is also to prevent relapse on previous tasks, known as catastrophic forgetting [LAM+21].

Continual Learning. By aligning the architectures and weights \((A_t, \theta_t)\), with the current task \(\mathcal{D}_t\), growing allows one to reuse existing weights and only add parameters for the next task \(\mathcal{D}_{t+1}\) when necessary. Some methods enforce sparsity in the weights  [YLZF21, YYLH18] or temporary pruning  [HTW+19, WLSL20] — disabling the weights to be able to adapt them when training the next task without introducing semantic drift. This results in supernets where each task is using a subset of the weights. Based on ablation studies [YYLH18], growth avoids catastrophic forgetting and adds flexibility to the parameters while maintaining smaller models. The Learn-to-Grow framework [LZW+19] uses NAS to bridge the gap between performance and accuracy by reusing existing weights as much as possible and replacing them when necessary.

Reinforcement Learning. Loss of plasticity is also a significant problem in Reinforcement Learning: the policy quickly overfits to initial observations and fails to adapt to new data as training progresses. Proposed solutions include periodically resetting the neural network [NSDOro+22] or using various forms of regularization [SDDA25]. Most recently, dynamic growth methods have been explored using sparse grow-prune methods to maintain plasticity over the course of training [LOCCP25].

References

[HTW+19]

Steven C. Y. Hung, Cheng-Hao Tu, Cheng-En Wu, Chien-Hung Chen, Yi-Ming Chan, and Chu-Song Chen. Compacting, picking and growing for unforgetting continual learning. In NeurIPS. 2019.

[LAM+21]

Matthias De Lange, Rahaf Aljundi, Marc Masana, Sarah Parisot, Xu Jia, Ales Leonardis, Gregory Slabaugh, and Tinne Tuytelaars. A continual learning survey: Defying forgetting in classification tasks. IEEE TPAMI, 2021. arXiv:1909.08383. URL: http://arxiv.org/abs/1909.08383, doi:10.1109/TPAMI.2021.3057446.

[LZW+19]

Xilai Li, Yingbo Zhou, Tianfu Wu, Richard Socher, and Caiming Xiong. Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting. In ICML. 2019. arXiv:1904.00310. URL: http://arxiv.org/abs/1904.00310.

[LOCCP25]

Jiashun Liu, Johan Obando-Ceron, Aaron Courville, and Ling Pan. Neuroplastic Expansion in Deep Reinforcement Learning. June 2025. arXiv:2410.07994. URL: http://arxiv.org/abs/2410.07994, doi:10.48550/arXiv.2410.07994.

[NSDOro+22]

Evgenii Nikishin, Max Schwarzer, Pierluca D'Oro, Pierre-Luc Bacon, and Aaron Courville. The Primacy Bias in Deep Reinforcement Learning. May 2022. arXiv:2205.07802. URL: http://arxiv.org/abs/2205.07802, doi:10.48550/arXiv.2205.07802.

[SDDA25]

Alena Shilova, Alex Davey, Brahim Driss, and Riad Akrour. Staq it! growing neural networks for policy mirror descent. arXiv preprint arXiv:2506.13862, 2025.

[WLSL20]

Lemeng Wu, Bo Liu, Peter Stone, and Qiang Liu. Firefly Neural Architecture Descent: a General Approach for Growing Neural Networks. In NeurIPS. 2020. URL: https://proceedings.neurips.cc/paper_files/paper/2020/hash/fdbe012e2e11314b96402b32c0df26b7-Abstract.html.

[YLZF21]

Li Yang, Sen Lin, Junshan Zhang, and Deliang Fan. GROWN: GRow Only When Necessary for Continual Learning. 2021. arXiv:2110.00908. URL: http://arxiv.org/abs/2110.00908, doi:10.48550/arXiv.2110.00908.

[YYLH18] (1,2)

Jaehong Yoon, Eunho Yang, Jeongtae Lee, and Sung Ju Hwang. Lifelong Learning with Dynamically Expandable Networks. In ICLR. 2018. arXiv:1708.01547. URL: http://arxiv.org/abs/1708.01547.