Firefly ======= **Firefly** :cite:p:`wu_firefly_2020` splits existing layers and add new neurons. Then it optimizes those new parameters through classical gradient descent. Finally, only the most loss-improving architecture changes are kept.