gromo.modules.growing_module.GrowingModule#
- class gromo.modules.growing_module.GrowingModule(layer: Module, tensor_s_shape: tuple[int, int] | None = None, tensor_m_shape: tuple[int, int] | None = None, post_layer_function: Module = Identity(), extended_post_layer_function: Module | None = None, allow_growing: bool = True, previous_module: Module | None = None, next_module: Module | None = None, device: device | None = None, name: str | None = None, target_in_neurons: int | None = None, initial_in_neurons: int | None = None)[source]#
Abstract class for a Module of dynamic size
- Parameters:
layer (torch.nn.Module) – layer of the module
tensor_s_shape (tuple[int, int] | None) – shape of the tensor S
tensor_m_shape (tuple[int, int] | None) – shape of the tensor M
post_layer_function (torch.nn.Module, optional) – function to apply after the layer, by default torch.nn.Identity()
extended_post_layer_function (torch.nn.Module | None, optional) – extended function to apply after the layer, by default None
allow_growing (bool) – if True, the module can grow (require a previous GrowingModule)
previous_module (torch.nn.Module | None) – previous module
next_module (torch.nn.Module | None) – next module
device (torch.device | None) – device to use
name (str | None) – name of the module
target_in_neurons (int | None, optional) – target fan-in size, by default None
initial_in_neurons (int | None, optional) – initial fan-in size, by default None
- property activation_gradient: Tensor#
Return the derivative of the activation function before this layer at 0+.
/!/ A caching mechanism is used to avoid recomputing the value multiple times. Therefore, if the previous module changes its post layer function, the cache must be cleared manually by setting _activation_gradient_previous_module to None.
- Returns:
derivative of the activation function before this layer at 0+
- Return type:
torch.Tensor
- Raises:
NotImplementedError – abstract method
- add_parameters(**kwargs: Any) None[source]#
Grow the module by adding new parameters to the layer.
- Parameters:
**kwargs (Any) – typically include the values of the new parameters to add to the layer
- Raises:
NotImplementedError – abstract method
- apply_change(scaling_factor: float | Tensor | None = None, apply_previous: bool = True, apply_delta: bool = True, apply_extension: bool = True, extension_size: int | None = None, optimal_delta_scaling: float | Tensor | None = None, input_extension_scaling: float | Tensor | None = None, output_extension_scaling: float | Tensor | None = None) None[source]#
Apply the optimal delta and extend the layer with current optimal delta and layer extension with the current scaling factors. This means that the layer input is extended with the current layer output extension and the previous layer output is extended with the previous layer output extension both scaled by the relevant extension scaling factors. This also means that the layer output is not extended.
- Parameters:
scaling_factor (float | torch.Tensor | None) – legacy aggregated scaling factor; sets optimal_delta_scaling = value**2 and both extension scalings to value (mirrors the historical coupling).
apply_previous (bool) – if True apply the change to the previous layer, by default True
apply_delta (bool) – if True apply the optimal delta to the layer, by default True
apply_extension (bool) – if True apply the extension to the layer, by default True
extension_size (int | None) – size of the extension to apply, by default None and get automatically determined using self.eigenvalues_extension.shape[0]
optimal_delta_scaling (float | torch.Tensor | None) – override for self.optimal_delta_scaling for this call (and persisted on the module). When None, the current attribute is used.
input_extension_scaling (float | torch.Tensor | None) – override for self.input_extension_scaling.
output_extension_scaling (float | torch.Tensor | None) – override for self.previous_module.output_extension_scaling.
- Raises:
ValueError – if the layer has no extension but an extension_size above zero was requested
NotImplementedError – if the previous module is not of type GrowingModule
- apply_neuron_pairing(neuron_pairing: Literal['vv_z_negz'] | None = None, noise_ratio: float = 0.001) None[source]#
Fill the second half of the extensions according to the pairing rule.
The extension layers are expected to already have their final (even) size. Only the first half (
[: dh // 2]rows / columns) must be initialised; the second half is overwritten in place:Output extension (previous layer): V -> (V, V). The first
dh // 2rows are left untouched; the seconddh // 2rows are copies of the first half.Input extension (current layer): Z -> (Z, -Z). The first
dh_in // 2columns are left untouched; the seconddh_in // 2columns are negated copies of the first half.
At initialisation this ensures the net contribution of new neurons is zero, preserving the function represented by the network. A small amount of noise is then added to the full input extension weight to break the symmetry between paired neurons, allowing them to learn different features during training.
Must be called after extensions are created and initialised.
- Parameters:
neuron_pairing (_KNOWN_NEURON_PAIRINGS_TYPE | None) – Pairing strategy. One of
"none","vv_z_negz".noise_ratio (float) – Fraction of the standard deviation of the input extension weights used as the noise level for symmetry breaking. Set to
0to disable noise (exact function preservation). Default0.001.
- Raises:
ValueError – If neuron_pairing is not a recognised strategy, or if one of the extensions has an odd leading dimension.
RuntimeError – If the required extension layers do not exist.
- apply_rescaling(rescaling: Literal['default_vt', 'vt_constraint_old_shape', 'vt_constraint_new_shape'] | None = None, neuron_pairing: Literal['vv_z_negz'] | None = None, extension_size: int | None = None) None[source]#
Rescale existing weights in-place before extension concatenation.
Implements three variance-transfer strategies from [1]:
"default_vt"(Strategy A): beta = sqrt(fan_in_old / fan_in_new), alpha = 1 (the previous layer input is not extended)."vt_constraint_old_shape"(Strategy B): alpha and beta chosen so that V[W] = 1 / fan_in_old after rescaling."vt_constraint_new_shape"(Strategy C): alpha and beta chosen so that V[W] = 1 / fan_in_new after rescaling.
"none"is a no-op.The current layer (self) is the one whose fan_in grows (Conv2 in a block context). The previous layer has its fan_out grow (Conv1).
- Parameters:
rescaling (_KNOWN_RESCALING_STRATEGIES_TYPE | None) – Rescaling strategy. One of
"default_vt","vt_constraint_old_shape","vt_constraint_new_shape".neuron_pairing (_KNOWN_NEURON_PAIRINGS_TYPE | None) – Neuron-pairing strategy that will be applied after rescaling. Validated for unknown values but no longer influences the fan-in computation —
extension_sizeis the final size. One of"none","vv_z_negz".extension_size (int | None) – Final number of neurons in the extension (pairing included). If
None, the size is read from the existingextended_input_layer.
- Raises:
ValueError – If rescaling or neuron_pairing is not a recognised strategy.
References
[1]Yuan et al., “Accelerated Training via Incrementally Growing Neural Networks using Variance Transfer and Learning Rate Adaptation”, 2024.
- property bias: Tensor#
Get the bias of the layer
- Returns:
bias tensor
- Return type:
torch.Tensor
- complete_growth(extension_kwargs: Any) None[source]#
Complete the growth to the target size.
- Parameters:
extension_kwargs (Any) – Additional arguments for creating layer extensions.
- compute_covariance_loss_gradient_update() tuple[Tensor, int][source]#
Compute the update of the empirical Fisher / gradient covariance \(E_s = dA^T dA\) summed over the batch (and over spatial positions for convolutional layers), i.e. the sum of per-sample outer products. Should be implemented by each layer type.
- Returns:
torch.Tensor – update of the gradient covariance, shape (cp, cp)
int – number of samples used to compute the update
- Raises:
NotImplementedError – abstract method
- compute_cross_covariance_update() tuple[Tensor, int][source]#
Compute the update of the tensor C := B[-1] B[-2]^T.
- Returns:
torch.Tensor – update of the tensor C
int – number of samples used to compute the update
- Raises:
NotImplementedError – abstract method
- compute_m_prev_update(desired_activation: Tensor | None = None) tuple[Tensor, int][source]#
Compute the update of the tensor M_{-2} := dA B[-2]^T.
- Parameters:
desired_activation (torch.Tensor | None) – desired variation direction of the output of the layer
- Returns:
torch.Tensor – update of the tensor M_{-2}
int – number of samples used to compute the update
- Raises:
NotImplementedError – abstract method
- compute_m_update(desired_activation: Tensor | None = None) tuple[Tensor, int][source]#
Compute the update of the tensor M. Should be added to the type of layer.
- Parameters:
desired_activation (torch.Tensor | None) – desired variation direction of the output of the layer
- Returns:
torch.Tensor – update of the tensor M
int – number of samples used to compute the update
- Raises:
NotImplementedError – abstract method
- compute_n_update() tuple[Tensor, int][source]#
Compute the update of the tensor N. Should be added to the type of layer.
- Returns:
torch.Tensor – update of the tensor N
int – number of samples used to compute the update
- Raises:
NotImplementedError – abstract method
- compute_optimal_delta(update: bool = True, dtype: dtype = torch.float32, force_pseudo_inverse: bool = False, use_fisher: bool = False) tuple[Tensor, Tensor | None, Tensor | float][source]#
Compute the optimal delta for the layer using current S and M tensors.
With
tensor_mshaped(in_features(+bias), out_features), the raw optimal update returned byoptimal_deltacorresponds to \((S^-1 M)^T\), using the pseudo-inverse ofSwhen needed. Whenuse_fisheris True, the empirical Fisher / gradient covariance \(E_s = \mathbb{E}[dA dA^T]\) is used as an output-feature left preconditioner, so the update is correspondingly preconditioned on the output side.Compute dW* (and dBias* if needed) and update the optimal_delta_layer attribute. L(A + gamma * B * dW) = L(A) - gamma * d + o(gamma) where d is the first order decrease and gamma the scaling factor.
- Parameters:
update (bool) – if True update the optimal delta layer attribute and the first order decrease
dtype (torch.dtype) – dtype for S and M during the computation
force_pseudo_inverse (bool) – if True, use the pseudo-inverse to compute the optimal delta even if the matrix is invertible
use_fisher (bool) – if True, use the empirical Fisher / gradient covariance as a left preconditioner. Relies on the independence hypothesis from the math notes (@hyp:independence).
- Returns:
optimal delta for the weights, the biases if needed and the first order decrease
- Return type:
tuple[torch.Tensor, torch.Tensor | None, torch.Tensor | float]
- compute_optimal_updates(numerical_threshold: float = 1e-06, statistical_threshold: float = 0.001, maximum_added_neurons: int | None = None, update_previous: bool = True, dtype: dtype = torch.float32, compute_delta: bool = True, use_covariance: bool = True, alpha_zero: bool = False, omega_zero: bool = False, use_projection: bool = True, ignore_singular_values: bool = False, use_fisher: bool = False) tuple[Tensor | None, Tensor | None][source]#
Compute the optimal update and additional neurons.
This method computes optimal weight updates for growing neural networks by analyzing gradient statistics and covariance information.
- compute_s_update() tuple[Tensor, int][source]#
Compute the update of the tensor S. Should be added to the type of layer.
- Returns:
torch.Tensor – update of the tensor S
int – number of samples used to compute the update
- Raises:
NotImplementedError – abstract method
- copy_uniform_initialization(tensor: Tensor, reference_tensor: Tensor | None, fan_in: int) None[source]#
Initialize tensor with uniform law aligned on reference
Initialize the tensor with a uniform law with bounds -sqrt(std(W)), sqrt(std(W)) where std(W) is the empirical standard deviation of the reference_tensor if the reference_tensor has a non-zero variance. Otherwise, use bounds -sqrt(6 / fan_in), sqrt(6 / fan_in) where fan_in is the number of input features of the reference tensor + extension.
- Parameters:
tensor (torch.Tensor) – tensor to initialize
reference_tensor (torch.Tensor | None) – tensor to get the standard deviation from or None to use Kaiming init
fan_in (int) – number of input features of the base tensor + extension
- create_layer_extensions(extension_size: int, output_extension_size: int | None = None, input_extension_size: int | None = None, output_extension_init: str = 'copy_uniform', input_extension_init: str = 'copy_uniform', neuron_pairing: Literal['vv_z_negz'] | None = None, rescaling: Literal['default_vt', 'vt_constraint_old_shape', 'vt_constraint_new_shape'] | None = None, noise_ratio: float = 0.001) None[source]#
Create extension for layer input and output.
Create the layer input and output extensions of given sizes, optionally rescaling existing weights and applying neuron pairing.
Allow to have different sizes for input and output extensions, this is useful for example if you connect a convolutional layer to a linear layer.
The execution order is:
Rescaling — existing weights are rescaled in-place (before extensions are created, so that
copy_uniforminit reads the rescaled weights as reference).Extension creation — physical extension layers are allocated at their final (post-pairing) size.
Initialisation — the first half of each extension is initialised when
neuron_pairingis active, the full extension otherwise.Neuron pairing — the already-allocated second half of each extension is filled in place via (V,V)/(Z,-Z).
- Parameters:
extension_size (int) – Size of the extension to create.
output_extension_size (int | None) – Size of the output extension to create, if
Noneuse extension_size.input_extension_size (int | None) – Size of the input extension to create, if
Noneuse extension_size.output_extension_init (str) – Initialisation method for the output extension. Must be one of the keys in
known_inits("copy_uniform","kaiming","zeros"), default"copy_uniform".input_extension_init (str) – Initialisation method for the input extension. Must be one of the keys in
known_inits("copy_uniform","kaiming","zeros"), default"copy_uniform".neuron_pairing (_KNOWN_NEURON_PAIRINGS_TYPE | None) – Neuron-pairing strategy applied after initialisation.
"none"(default) or"vv_z_negz". /!/ Whenneuron_pairingis active,extension_size(andoutput_extension_size/input_extension_size) is the final size, pairing included, and must be even. AValueErroris raised otherwise.rescaling (_KNOWN_RESCALING_STRATEGIES_TYPE | None) – Variance-transfer rescaling strategy applied before extension creation.
"none"(default),"default_vt","vt_constraint_old_shape", or"vt_constraint_new_shape".noise_ratio (float) – Fraction of the standard deviation of the input extension weights used as the noise level for symmetry breaking after neuron pairing. Set to
0for exact function preservation. Default0.001.
Notes
Additional initialization methods can be added by registering them in the local
known_initsdictionary of this method. Each initialization callable is applied to the extension weight tensor and to the extension bias tensor, if the layer has a bias.The callable must accept the following arguments:
- tensor: torch.Tensor
Tensor of the weight/bias extension, to initialize.
- reference_tensor: torch.Tensor | None
Weight/bias tensor from the layer before extension.
- fan_in: int
The fan_in of the layer, after including the extension.
An initialization callable may also modify the existing weights/biases, by mutating
reference_tensor.- Raises:
ValueError – If unknown initialization method, rescaling strategy, or neuron pairing.
- create_layer_in_extension(extension_size: int) None[source]#
Create the layer input extension of given size.
- Parameters:
extension_size (int) – size of the extension to create
- Raises:
NotImplementedError – abstract method
- create_layer_out_extension(extension_size: int) None[source]#
Create the layer output extension of given size.
- Parameters:
extension_size (int) – size of the extension to create
- Raises:
NotImplementedError – abstract method
- delete_update(include_previous: bool = True, delete_delta: bool = True, delete_input: bool = True, delete_output: bool = False) None[source]#
Delete the updates of the layer: - optimal_delta_layer - extended_input_layer and associated extensions
By default, we do not delete the extended_output_layer of this layer because it could be required by the next layer.
- Parameters:
include_previous (bool, optional) – delete the extended_output_layer of the previous layer, by default True
delete_delta (bool, optional) – delete the optimal_delta_layer of the module, by default True
delete_input (bool, optional) – delete the extended_input_layer of this module, by default True
delete_output (bool, optional) – delete the extended_output_layer of this layer, by default False warning: this does not delete the extended_input_layer of the next layer
- Raises:
NotImplementedError – if include_previous is True and the previous module is of type MergeGrowingModule
TypeError – if previous module is not of type GrowingModule or MergeGrowingModule
- extended_forward(x: Tensor, x_ext: Tensor | None = None, use_optimal_delta: bool = True, use_extended_input: bool = True, use_extended_output: bool = True) tuple[Tensor, Tensor | None][source]#
Forward pass of the module with layer extension and layer update scaled according to the scaling factors: - optimal_delta_layer is scaled by optimal_delta_scaling - extended_input_layer is scaled by input_extension_scaling - extended_output_layer is scaled by output_extension_scaling WARNING: does not store the input and pre-activity tensors. WARNING: the scaling factor is squared for the optimal delta and linear for the extension. (Instead of linear for the optimal delta and root squared for the extension as in the theory).
- Parameters:
x (torch.Tensor) – input tensor
x_ext (torch.Tensor | None) – extension tensor
use_optimal_delta (bool, optional) – if True, use the optimal delta layer, default True
use_extended_input (bool, optional) – if True, use the extended input layer, default True
use_extended_output (bool, optional) – if True, use the extended output layer, default True
- Returns:
output tensor and extension tensor
- Return type:
tuple[torch.Tensor, torch.Tensor | None]
- Raises:
ValueError – if the input is extended and x_ext is not provided
- property first_order_improvement: Tensor#
Get the first order improvement of the block.
- Returns:
first order improvement
- Return type:
torch.Tensor
- forward(x: Tensor) Tensor[source]#
Forward pass of the module. If needed, store the activity and pre-activity tensors.
- Parameters:
x (torch.Tensor) – input tensor
- Returns:
output tensor
- Return type:
torch.Tensor
- get_fan_in_from_layer(layer: Module | None = None, num_neurons: int | None = None) int[source]#
Get the fan_in (number of input features) from a given layer or from a given number of neurons.
- Parameters:
layer (torch.nn.Module | None) – layer to get the fan_in from
num_neurons (int | None) – number of neurons in the layer
- Returns:
fan_in of the layer
- Return type:
- Raises:
NotImplementedError – abstract method
- property in_features: int#
Fan-in size
- Returns:
fan-in size
- Return type:
- Raises:
NotImplementedError – abstract method
- property in_neurons: int#
Number of input neurons
- Returns:
number of input neurons
- Return type:
- Raises:
NotImplementedError – abstract method
- property input: Tensor#
Get the input of the layer
- Returns:
input tensor
- Return type:
torch.Tensor
- Raises:
ValueError – if the input is not stored
- property input_extended: Tensor#
Return the input extended ones if the bias is used.
- Returns:
input extended
- Return type:
torch.Tensor
- Raises:
NotImplementedError – abstract method if bias is used
- property input_size: tuple[int, ...]#
Get the expected shape of the input excluding batch size and channels
- Returns:
input shape
- Return type:
- Raises:
ValueError – if the input size is not given and cannot be calculated
- property input_volume: int#
Expected input volume
- Returns:
input volume
- Return type:
- Raises:
NotImplementedError – abstract method
- kaiming_initialization(tensor: Tensor, reference_tensor: Tensor | None, fan_in: int) None[source]#
Initialize tensor with Kaiming.
- Parameters:
tensor (torch.Tensor) – tensor to initialize
reference_tensor (torch.Tensor | None) – Unused
fan_in (int) – number of input features of the base tensor + extension
- layer_in_extension(weight: Tensor) None[source]#
Extend the layer with the parameters of layer assuming that the input of the layer is extended but not the output.
- Parameters:
weight (torch.Tensor) – weight of the extension
- Raises:
NotImplementedError – abstract method
- layer_of_tensor(weight: Tensor, bias: Tensor | None = None, force_bias: bool = True) Module[source]#
- Create a layer with the same characteristics (excepted the shape)
with weight as parameter and bias as bias.
- Parameters:
weight (torch.Tensor) – weight of the layer
bias (torch.Tensor | None) – bias of the layer
force_bias (bool) – if True, the created layer require a bias if self.use_bias is True
- Returns:
layer with the same characteristics
- Return type:
torch.nn.Module
- Raises:
NotImplementedError – abstract method
- layer_out_extension(weight: Tensor, bias: Tensor | None = None) None[source]#
Extend the layer with the parameters of layer assuming that the output of the layer is extended but not the input.
- Parameters:
weight (torch.Tensor) – weight of the extension
bias (torch.Tensor | None) – bias of the extension if needed
- Raises:
NotImplementedError – abstract method
- missing_neurons() int[source]#
Get the number of missing neurons to reach the target hidden features.
- Returns:
number of missing neurons
- Return type:
- Raises:
ValueError – if target_in_neurons are not set
- normalize_optimal_updates(std_target: float | None = None, normalization_type: str = 'legacy_normalization', gradmax_scale: float = 1.0) None[source]#
Normalize optimal update to target standard deviation
Normalize the optimal updates so that the standard deviation of the weights of the updates is equal to std_target. If std_target is None, we automatically determine it. We use the standard deviation of the weights of the layer if it has weights. If the layer has no weights, we aim to have a std of 1 / sqrt(in_features).
If normalization_type is “equalize_second_layer”: Let s be the target standard deviation then: - optimal_delta_layer is scaled to have a std of s (so by s / std(optimal_delta_layer)) - extended_input_layer is scaled to have a std of s (so by s / std(extended_input_layer)) - extended_output_layer is scaled to match the scaling of the extended_input_layer and the optimal_delta_layer (so by std(extended_input_layer) / std(optimal_delta_layer))
If normalization_type is “equalize_extensions”: Let s be the target standard deviation then: - extended_input_layer is scaled to have a std of s (so by s / std(extended_input_layer)) - extended_output_layer is scaled to have a std of s (so by s / std(extended_output_layer)) - optimal_delta_layer is scaled to match the scaling of the extended_input_layer and the extended_output_layer (so by s ** 2 / (std(extended_input_layer) * std(extended_output_layer))) Note that the goal here is to give both extensions the same scale; for the output extension this is not the std of the layer it extends (that layer belongs to
self.previous_module), so the result is not scale-matched to the target layer. Use “match_extending_layer” for that behaviour.If normalization_type is “match_extending_layer”: Each update component is scaled to match the std of the layer it modifies: - extended_input_layer is scaled to std(self.layer.weight) - previous_module.extended_output_layer is scaled to std(previous_module.layer.weight) - optimal_delta_layer is scaled to std(self.layer.weight) In this mode
std_targetis ignored; each component uses the std of its target layer. A previous GrowingModule with a.layerexposing weights is required.If normalization_type is “gradmax_normalization”: Let
gradmax_scalebe \(s\) (default 1). Let \(c = s \cdot \text{mean}_i \|W_i\|\) where \(\|W_i\|\) are the L2 (or Frobenius per channel) norms of the existing slices ofself.layer.weightalong the input / fan-in axis (axis 1 fornn.Linear). Each new column (added neuron) w ofextended_input_layer.weightis rescaled asw <- w / ||w|| * cwhen||w|| > 0.For each such column j, let r_j be the factor applied (target norm divided by the column norm before scaling). If
eigenvalues_extensionis present and has one entry per added neuron, it is updated aseigenvalues_extension[j] *= sqrt(r_j), matching the one-sided analogue ofscale_layer_extension(which uses*= sqrt(scale_output * scale_input)) when only the input extension is rescaled.- Parameters:
std_target (float | None) – target standard deviation for the weights of the updates. Ignored when
normalization_type == "match_extending_layer".normalization_type (str) – type of normalization to use, one of ‘equalize_second_layer’, ‘equalize_extensions’, ‘match_extending_layer’, ‘weird_normalization’, ‘legacy_normalization’, ‘gradmax_normalization’
gradmax_scale (float) – For
gradmax_normalizationonly: scalar \(s\) in \(c = s \\cdot \\text{mean}(\\|W_i\\|)\). Must be positive. Default1.0.
- Raises:
ValueError – if there is no previous module or the normalization_type is invalid
- number_of_neurons_to_add(method: str = 'fixed_proportional', number_of_growth_steps: int = 1) int[source]#
Get the number of neurons to add in the next growth step.
- - fixed_proportional: add a fixed proportion of the total number of neurons
to add at each growth step. The amount to add is computed as an integer division as a consequence a few neurons may remain to be added after all growth steps have been performed.
- Parameters:
- Returns:
Number of neurons to add.
- Return type:
- Raises:
ValueError – if target_in_neurons or initial_in_neurons are not set or the method is unknown
- number_of_parameters() int[source]#
Return the number of parameters of the layer.
- Returns:
number of parameters
- Return type:
- property out_features: int#
Fan-out size
- Returns:
fan-out size
- Return type:
- Raises:
NotImplementedError – abstract method
- property output_volume: int#
Expected output volume
- Returns:
output volume
- Return type:
- Raises:
NotImplementedError – abstract method
- parameter_step(delta_weights: Tensor, delta_biases: Tensor | None = None) None[source]#
Update the parameters of the layer with the given deltas.
- Parameters:
delta_weights (torch.Tensor) – delta values for the weights
delta_biases (torch.Tensor | None) – delta values for the biases, if None, the biases are not updated
- parameters(recurse: bool = True) Iterator[Parameter][source]#
Return the parameters of the layer.
- Parameters:
recurse (bool) – if True, return the parameters of the submodules
- Returns:
iterator over the parameters of the layer
- Return type:
Iterator[torch.nn.Parameter]
- property pre_activity: Tensor#
Get the pre activity of the layer
- Returns:
pre activity tensor
- Return type:
torch.Tensor
- Raises:
ValueError – if the pre activity is not stored
- projected_v_goal(input_vector: Tensor) Tensor[source]#
Compute the projected gradient of the goal with respect to the activity of the layer.
dLoss/dA_proj := dLoss/dA - dW B[-1] where A is the pre-activation vector of the layer, and dW the optimal delta for the layer
- Parameters:
input_vector (torch.Tensor) – input vector B[-1] of shape (n_samples, in_features)
- Returns:
projected gradient of the goal with respect to the activity of the next layer dLoss/dA - dW B[-1]
- Return type:
torch.Tensor
- static scale_layer(layer: Module, scale: float) Module[source]#
Scale the weights and biases of a given layer by a specified factor.
- Parameters:
layer (torch.nn.Module) – The layer whose parameters are to be scaled.
scale (float) – The factor by which to scale the layer’s parameters.
- Returns:
The layer with scaled parameters.
- Return type:
torch.nn.Module
- scale_layer_extension(scale: float | None, scale_output: float | None, scale_input: float | None) None[source]#
Scale the layer extension by a given factor. This means scaling the extended_input_layer, the extended_output_layer and the eigenvalues_extension. However as the eigenvalues_extension will be squared they will be scaled by sqrt(scale_input * scale_output).
- Parameters:
scale (float | None) – The factor by which to scale the layer extension. If not None, replace both scale_input and scale_output if they are not None.
scale_output (float | None) – The factor by which to scale the layer output extension.
scale_input (float | None) – The factor by which to scale the layer input extension. If not None, scale must be None.
- Raises:
ValueError – Cannot scale layer extension if one of the extensions is None
- scale_parameter_update(scale: float) None[source]#
Scale the parameter update by a given factor. This means scaling the optimal delta and the parameter_update_decrease.
- Parameters:
scale (float) – The factor by which to scale the parameter update.
- set_scaling_factor(factor: float) None[source]#
Assign scaling factor to all growing layers
- Parameters:
factor (float) – scaling factor
- sub_select_optimal_added_parameters(keep_neurons: int | None = None, threshold: float | None = None, sub_select_previous: bool = True, zeros_if_not_enough: bool = False, zeros_fan_in: bool = True, zeros_fan_out: bool = False) None[source]#
Select the first keep_neurons neurons of the optimal added parameters linked to this layer.
- Parameters:
keep_neurons (int | None) – number of neurons to keep, if None, the number of neurons is determined by the threshold
threshold (float | None) – threshold to determine the number of neurons to keep, if None, keep_neurons must be provided
sub_select_previous (bool) – if True, sub-select the previous layer added parameters as well
zeros_if_not_enough (bool) – if True, will keep all the neurons and set the non selected ones to zero (either first or last depending on zeros_fan_in and zeros_fan_out)
zeros_fan_in (bool) – if True and zeros_if_not_enough is True, will set the non selected fan-in parameters to zero
zeros_fan_out (bool) – if True and zeros_if_not_enough is True, will set the non selected fan-out parameters to zero
- Raises:
ValueError – if there is no previous module
NotImplementedError – if the previous module is not the same class
- property tensor_n: Tensor#
Compute the tensor N for the layer with the current M_{-2}, C and optimal delta.
- Returns:
N
- Return type:
torch.Tensor
- Raises:
NotImplementedError – abstract method
- property tensor_s: TensorStatistic#
Return the tensor S of the layer. Either the tensor S computed locally or the tensor S of the previous merge layer.
- Returns:
tensor S
- Return type:
TensorStatistic
- property tensor_s_growth#
Redirect to the tensor S of the previous module.
- update_input_size(input_size: tuple[int, ...] | None = None, compute_from_previous: bool = False, force_update: bool = True) tuple[int, ...] | None[source]#
Update the input size of the layer. Either according to the parameter or the input currently stored.
- Parameters:
- Returns:
updated input size if it could be computed, None otherwise
- Return type:
- Raises:
NotImplementedError – abstract method
- property weight: Tensor#
Get the weight of the layer
- Returns:
weight tensor
- Return type:
torch.Tensor