Non-stationary data distributions
==============================================

A natural application of neuron addition is to non-stationary data
distributions, such as Continual Learning. Instead of a single task,
represented by its dataset :math:`\mathcal{D}`, we have a sequence of
tasks :math:`\mathcal{D}_1, \dots, \mathcal{D}_T`, and want a model that
performs well on all tasks
:math:`\{\mathcal{D}_1, \dots, \mathcal{D}_T\}`. In addition to the
objective of maximising performance on the current task
:math:`\mathcal{D}_i`, the goal is also to prevent relapse on previous
tasks, known as *catastrophic
forgetting* :cite:p:`lange_continual_2021`.

**Continual Learning.** By aligning the architectures and weights
:math:`(A_t, \theta_t)`, with the current task :math:`\mathcal{D}_t`,
growing allows one to reuse existing weights and only add parameters for
the next task :math:`\mathcal{D}_{t+1}` when necessary. Some methods
enforce sparsity in the weights
 :cite:p:`yoon_lifelong_2018,yang_grown_2021` or temporary
pruning  :cite:p:`hung_CPG_2019,wu_firefly_2020` — disabling
the weights to be able to adapt them when training the next task without
introducing semantic drift. This results in supernets where each task is
using a subset of the weights. Based on ablation
studies :cite:p:`yoon_lifelong_2018`, growth avoids
catastrophic forgetting and adds flexibility to the parameters while
maintaining smaller models. The **Learn-to-Grow**
framework :cite:p:`li_learn_2019` uses NAS to bridge the gap
between performance and accuracy by reusing existing weights as much as
possible and replacing them when necessary.

**Reinforcement Learning.** Loss of plasticity is also a significant
problem in Reinforcement Learning: the policy quickly overfits to
initial observations and fails to adapt to new data as training
progresses. Proposed solutions include periodically resetting the neural
network :cite:p:`nikishin_primacy_2022` or using various forms
of regularization :cite:p:`staq`. Most recently, dynamic
growth methods have been explored using sparse grow-prune methods to
maintain plasticity over the course of
training :cite:p:`liu_neuroplastic_2025`.