gromo.modules.growing_module.GrowingModule#

class gromo.modules.growing_module.GrowingModule(layer: Module, tensor_s_shape: tuple[int, int] | None = None, tensor_m_shape: tuple[int, int] | None = None, post_layer_function: Module = Identity(), allow_growing: bool = True, previous_module: Module | None = None, next_module: Module | None = None, device: device | None = None, name: str | None = None, s_growth_is_needed: bool = True)[source]#

property activation_gradient: Tensor#

Return the derivative of the activation function before this layer at 0+.

Returns:: derivative of the activation function before this layer at 0+
Return type:: torch.Tensor

add_parameters(**kwargs) → None[source]#

Grow the module by adding new parameters to the layer.

Parameters:: kwargs (dict) – typically include the values of the new parameters to add to the layer

apply_change(scaling_factor: float | Tensor | None = None, apply_previous: bool = True, apply_delta: bool = True, apply_extension: bool = True) → None[source]#

Apply the optimal delta and extend the layer with current optimal delta and layer extension with the current scaling factor. This means that the layer input is extended with the current layer output extension and the previous layer output is extended with the previous layer output extension both scaled by the current scaling factor. This also means that the layer output is not extended.

Parameters:

scaling_factor (float | torch.Tensor | None) –

scaling factor to apply to the optimal delta,
if None use the current scaling factor
apply_previous (bool) – if True apply the change to the previous layer, by default True
apply_delta (bool) – if True apply the optimal delta to the layer, by default True
apply_extension (bool) – if True apply the extension to the layer, by default True

compute_cross_covariance_update() → tuple[Tensor, int][source]#

Compute the update of the tensor C := B[-1] B[-2]^T.

Returns:

torch.Tensor – update of the tensor C
int – number of samples used to compute the update

compute_m_prev_update(desired_activation: Tensor | None = None) → tuple[Tensor, int][source]#

Compute the update of the tensor M_{-2} := dA B[-2]^T.

Parameters:

desired_activation (torch.Tensor | None) – desired variation direction of the output of the layer

Returns:

torch.Tensor – update of the tensor M_{-2}
int – number of samples used to compute the update

compute_m_update(desired_activation: Tensor | None = None) → tuple[Tensor, int][source]#

Compute the update of the tensor M. Should be added to the type of layer.

Parameters:

desired_activation (torch.Tensor | None) – desired variation direction of the output of the layer

Returns:

torch.Tensor – update of the tensor M
int – number of samples used to compute the update

compute_n_update()[source]#

Compute the update of the tensor N. Should be added to the type of layer.

Returns:: update of the tensor N
Return type:: torch.Tensor

compute_optimal_added_parameters(numerical_threshold: float = 1e-15, statistical_threshold: float = 0.001, maximum_added_neurons: int | None = None, update_previous: bool = True, dtype: dtype = torch.float32) → tuple[Tensor, Tensor | None, Tensor, Tensor][source]#

Compute the optimal added parameters to extend the input layer. Update the extended_input_layer and the eigenvalues_extension.

Parameters:

numerical_threshold (float) – threshold to consider an eigenvalue as zero in the square root of the inverse of S
statistical_threshold (float) – threshold to consider an eigenvalue as zero in the SVD of S{-1/2} N
maximum_added_neurons (int | None) – maximum number of added neurons, if None all significant neurons are kept
update_previous (bool) – whether to change the previous layer extended_output_layer
dtype (torch.dtype) – dtype for S and N during the computation

Returns:

optimal added weights alpha weights, alpha bias, omega and eigenvalues lambda

Return type:

tuple[torch.Tensor, torch.Tensor | None, torch.Tensor, torch.Tensor]

compute_optimal_delta(update: bool = True, dtype: dtype = torch.float32, force_pseudo_inverse: bool = False) → tuple[Tensor, Tensor | None, Tensor | float][source]#

Compute the optimal delta for the layer using current S and M tensors.

dW* = M S[-1]^-1 (if needed we use the pseudo-inverse)

Compute dW* (and dBias* if needed) and update the optimal_delta_layer attribute. L(A + gamma * B * dW) = L(A) - gamma * d + o(gamma) where d is the first order decrease and gamma the scaling factor.

Parameters:

update (bool) – if True update the optimal delta layer attribute and the first order decrease
dtype (torch.dtype) – dtype for S and M during the computation
force_pseudo_inverse (bool) – if True, use the pseudo-inverse to compute the optimal delta even if the matrix is invertible

Returns:

optimal delta for the weights, the biases if needed and the first order decrease

Return type:

tuple[torch.Tensor, torch.Tensor | None, torch.Tensor | float]

compute_optimal_updates(numerical_threshold: float = 1e-10, statistical_threshold: float = 1e-05, maximum_added_neurons: int | None = None, update_previous: bool = True, zero_delta: bool = False, dtype: dtype = torch.float32) → tuple[Tensor, Tensor | None][source]#

Compute the optimal update and additional neurons.

Parameters:

numerical_threshold (float) – threshold to consider an eigenvalue as zero in the square root of the inverse of S
statistical_threshold (float) – threshold to consider an eigenvalue as zero in the SVD of S{-1/2} N
maximum_added_neurons (int | None) – maximum number of added neurons, if None all significant neurons are kept
update_previous (bool) – whether to change the previous layer extended_output_layer
zero_delta (bool) – if True, compute the optimal added neurons without performing the natural gradient step.
dtype (torch.dtype) – dtype for the computation of the optimal delta and added parameters

Returns:

optimal extension for the previous layer (weights and biases)

Return type:

tuple[torch.Tensor, torch.Tensor | None]

compute_s_growth_update() → tuple[Tensor, int][source]#

Compute the update of the tensor S_growth.

Returns:

torch.Tensor – update of the tensor S_growth
int – number of samples used to compute the update

compute_s_update() → tuple[Tensor, int][source]#

Compute the update of the tensor S. Should be added to the type of layer.

Returns:

torch.Tensor – update of the tensor S
int – number of samples used to compute the update

delete_update(include_previous: bool = True, include_output: bool = False) → None[source]#

Delete the updates of the layer: - optimal_delta_layer - extended_input_layer and associated extensions

By default, we do not delete the extended_output_layer of this layer because it could be required by the next layer.

Parameters:

include_previous (bool) – if True delete the extended_output_layer of the previous layer
include_output (bool) – if True delete the extended_output_layer of this layer, warning: this does not delete the extended_input_layer of the next layer

extended_forward(x: Tensor, x_ext: Tensor | None = None) → tuple[Tensor, Tensor | None][source]#

Forward pass of the module with layer extension and layer update scaled according to the scaling factor. WARNING: does not store the input and pre-activity tensors. WARNING: the scaling factor is squared for the optimal delta and linear for the extension. (Instead of linear for the optimal delta and squared for the extension as in the theory).

Parameters:

x (torch.Tensor) – input tensor
x_ext (torch.Tensor | None) – extension tensor

Returns:

output tensor and extension tensor

Return type:

tuple[torch.Tensor, torch.Tensor]

property first_order_improvement: Tensor#

Get the first order improvement of the block.

Returns:: first order improvement
Return type:: torch.Tensor

forward(x)[source]#

Forward pass of the module. If needed, store the activity and pre-activity tensors.

Parameters:: x (torch.Tensor) – input tensor
Returns:: output tensor
Return type:: torch.Tensor

init_computation() → None[source]#: Initialize the computation of the optimal added parameters.

property input_extended: Tensor#

Return the input extended ones if the bias is used.

Returns:: input extended
Return type:: torch.Tensor

layer_in_extension(weight: Tensor) → None[source]#

Extend the layer with the parameters of layer assuming that the input of the layer is extended but not the output.

Parameters:: weight (torch.Tensor) – weight of the extension

layer_of_tensor(weight: Tensor, bias: Tensor | None = None) → Linear[source]#

Create a layer with the same characteristics (excepted the shape): with weight as parameter and bias as bias.

Parameters:

weight (torch.Tensor) – weight of the layer
bias (torch.Tensor | None) – bias of the layer

Returns:

layer with the same characteristics

Return type:

torch.nn.Linear

layer_out_extension(weight: Tensor, bias: Tensor | None = None) → None[source]#

Extend the layer with the parameters of layer assuming that the output of the layer is extended but not the input.

Parameters:

weight (torch.Tensor) – weight of the extension
bias (torch.Tensor | None) – bias of the extension if needed

number_of_parameters() → int[source]#

Return the number of parameters of the layer.

Returns:: number of parameters
Return type:: int

parameter_step(delta_weights: Tensor, delta_biases: Tensor | None = None) → None[source]#

Update the parameters of the layer with the given deltas.

Parameters:

delta_weights (torch.Tensor) – delta values for the weights
delta_biases (torch.Tensor | None) – delta values for the biases, if None, the biases are not updated

parameters(recurse: bool = True) → Iterator[Parameter][source]#

Return the parameters of the layer.

Parameters:: recurse (bool) – if True, return the parameters of the submodules
Returns:: iterator over the parameters of the layer
Return type:: Iterator[Parameter]

projected_v_goal(input_vector: Tensor) → Tensor[source]#

Compute the projected gradient of the goal with respect to the activity of the layer.

dLoss/dA_proj := dLoss/dA - dW B[-1] where A is the pre-activation vector of the layer, and dW the optimal delta for the layer

Parameters:: input_vector (torch.Tensor of shape (n_samples, in_features)) – input vector B[-1]
Returns:: projected gradient of the goal with respect to the activity of the next layer dLoss/dA - dW B[-1]
Return type:: torch.Tensor

reset_computation() → None[source]#: Reset the computation of the optimal added parameters.

sub_select_optimal_added_parameters(keep_neurons: int, sub_select_previous: bool = True) → None[source]#

Select the first keep_neurons neurons of the optimal added parameters linked to this layer.

Parameters:

keep_neurons (int) – number of neurons to keep
sub_select_previous (bool) – if True, sub-select the previous layer added parameters as well

property tensor_n: Tensor#

Compute the tensor N for the layer with the current M_{-2}, C and optimal delta.