gromo.utils.tools.compute_optimal_added_parameters#

gromo.utils.tools.compute_optimal_added_parameters(matrix_s: Tensor | None, matrix_n: Tensor, numerical_threshold: float = 1e-06, statistical_threshold: float = 0.001, maximum_added_neurons: int | None = None, alpha_zero: bool = False, omega_zero: bool = False, ignore_singular_values: bool = False, matrix_covariance_loss_gradient: Tensor | None = None) → tuple[Tensor, Tensor, Tensor][source]#

Compute the optimal added parameters for a given layer.

This function operates on primitive options, not method names.

Parameters:

matrix_s (torch.Tensor | None) – Square matrix S of shape (s, s). If None, identity matrix is used.
matrix_n (torch.Tensor) – Matrix N (correlation matrix) of shape (s, t).
numerical_threshold (float) – Threshold to consider an eigenvalue as zero in square root of inverse of S
statistical_threshold (float) – Threshold to consider a singular value as zero in the SVD
maximum_added_neurons (int | None) – Maximum number of added neurons, if None all significant neurons are kept
alpha_zero (bool) – If True, set alpha (incoming weights) to zero, else compute from SVD.
omega_zero (bool) – If True, set omega (outgoing weights) to zero, else compute from SVD.
ignore_singular_values (bool) – If True, ignore the actual singular values and treat them as 1 for computing alpha and omega, effectively only using the singular vectors for the update direction.
matrix_covariance_loss_gradient (torch.Tensor | None) – Square matrix E_s of shape (t, t). If provided, the SVD target becomes S^{-1/2} N E_s^{-1/2} and omega is left-multiplied by E_s^{-1/2}, which applies the empirical-Fisher preconditioning to the rank-k extension. Note that this silently uses the independence hypothesis described in first_order_optimization.typ (@hyp:independence).

Returns:

torch.Tensor – Optimal added weights alpha, shape (k, s).
torch.Tensor – Optimal added weights omega, shape (t, k).
torch.Tensor – Singular values s, shape (k,).

Raises:

torch.linalg.LinAlgError – If SVD of S^{-1/2} N fails.