Open Questions and Future Directions ==================================== Despite a significant body of work on *how to grow* and *when to grow*, several obstacles prevent growing methods from being used as a practical tool for frugal architecture discovery. A first question is when growth is preferable to alternatives such as pure pruning methods, especially on modern GPU hardware, due to limited empirical evidence either way :cite:p:`boumendil_grow_2023`. Benchmarks often focus on parameter or FLOPs reduction, but this does not transfer well to walltime reductions. Future benchmarks should prioritise walltime comparisons, which are currently only well studied in the context of transformer growth. Existing methods optimise a local proxy objective, such as selecting the steepest immediate loss decrease, following the natural gradient, or heuristics intended to preserve trainability. The size of the gap between the proxy objective optimised by repeated growing operations of Eq. `[eqn:grow_decomposition] <#eqn:grow_decomposition>`__ and the idealised objective of Eq. `[eqn:ideal_obj] <#eqn:ideal_obj>`__ is poorly understood. Treating growth as a sequential decision-making problem, one can decide where, when, and how to grow, to directly optimise in Eq. `[eqn:ideal_obj] <#eqn:ideal_obj>`__. This is similar to Reinforcement Learning approaches used in neural architecture search :cite:p:`zoph_neural_2017`. Optimisation in the presence of growth is underexplored: it is common practice to simply reset the optimizer state after growth. Variance Transfer :cite:p:`yuan_accelerated_2023` adapt the learning rate to the growth stage, but there are no strategies for transferring the rest of the optimiser state (*e.g.* momentum) when growing. Finally, the majority of growing methods are currently unsuitable for use as frugal architecture search, due to the lack of i) a notion of where to grow ii) an efficiency-aware stopping rule. Existing benchmarks are not designed to test those issues. One could test growth on “unseen” problems where the optimal architecture may lie far from the seed, rather than being reachable by small incremental adjustments. Similar benchmarks exist for NAS :cite:p:`geada_unseennaschallenge_2024`. However, there is no principled reason why the existing methods cannot be adapted for frugal growth. We hope that the ideas in this survey will spur the community to grow in promising new directions.