gromo.modules.growing_module.GrowingModule#
- class gromo.modules.growing_module.GrowingModule(layer: Module, tensor_s_shape: tuple[int, int] | None = None, tensor_m_shape: tuple[int, int] | None = None, post_layer_function: Module = Identity(), allow_growing: bool = True, previous_module: Module | None = None, next_module: Module | None = None, device: device | None = None, name: str | None = None, s_growth_is_needed: bool = True)[source]#
- property activation_gradient: Tensor#
Return the derivative of the activation function before this layer at 0+.
- Returns:
derivative of the activation function before this layer at 0+
- Return type:
torch.Tensor
- add_parameters(**kwargs) None [source]#
Grow the module by adding new parameters to the layer.
- Parameters:
kwargs (dict) – typically include the values of the new parameters to add to the layer
- apply_change(scaling_factor: float | Tensor | None = None, apply_previous: bool = True, apply_delta: bool = True, apply_extension: bool = True) None [source]#
Apply the optimal delta and extend the layer with current optimal delta and layer extension with the current scaling factor. This means that the layer input is extended with the current layer output extension and the previous layer output is extended with the previous layer output extension both scaled by the current scaling factor. This also means that the layer output is not extended.
- Parameters:
scaling_factor (float | torch.Tensor | None) –
- scaling factor to apply to the optimal delta,
if None use the current scaling factor
apply_previous (bool) – if True apply the change to the previous layer, by default True
apply_delta (bool) – if True apply the optimal delta to the layer, by default True
apply_extension (bool) – if True apply the extension to the layer, by default True
- compute_cross_covariance_update() tuple[Tensor, int] [source]#
Compute the update of the tensor C := B[-1] B[-2]^T.
- Returns:
torch.Tensor – update of the tensor C
int – number of samples used to compute the update
- compute_m_prev_update(desired_activation: Tensor | None = None) tuple[Tensor, int] [source]#
Compute the update of the tensor M_{-2} := dA B[-2]^T.
- Parameters:
desired_activation (torch.Tensor | None) – desired variation direction of the output of the layer
- Returns:
torch.Tensor – update of the tensor M_{-2}
int – number of samples used to compute the update
- compute_m_update(desired_activation: Tensor | None = None) tuple[Tensor, int] [source]#
Compute the update of the tensor M. Should be added to the type of layer.
- Parameters:
desired_activation (torch.Tensor | None) – desired variation direction of the output of the layer
- Returns:
torch.Tensor – update of the tensor M
int – number of samples used to compute the update
- compute_n_update()[source]#
Compute the update of the tensor N. Should be added to the type of layer.
- Returns:
update of the tensor N
- Return type:
torch.Tensor
- compute_optimal_added_parameters(numerical_threshold: float = 1e-15, statistical_threshold: float = 0.001, maximum_added_neurons: int | None = None, update_previous: bool = True, dtype: dtype = torch.float32) tuple[Tensor, Tensor | None, Tensor, Tensor] [source]#
Compute the optimal added parameters to extend the input layer. Update the extended_input_layer and the eigenvalues_extension.
- Parameters:
numerical_threshold (float) – threshold to consider an eigenvalue as zero in the square root of the inverse of S
statistical_threshold (float) – threshold to consider an eigenvalue as zero in the SVD of S{-1/2} N
maximum_added_neurons (int | None) – maximum number of added neurons, if None all significant neurons are kept
update_previous (bool) – whether to change the previous layer extended_output_layer
dtype (torch.dtype) – dtype for S and N during the computation
- Returns:
optimal added weights alpha weights, alpha bias, omega and eigenvalues lambda
- Return type:
tuple[torch.Tensor, torch.Tensor | None, torch.Tensor, torch.Tensor]
- compute_optimal_delta(update: bool = True, dtype: dtype = torch.float32, force_pseudo_inverse: bool = False) tuple[Tensor, Tensor | None, Tensor | float] [source]#
Compute the optimal delta for the layer using current S and M tensors.
dW* = M S[-1]^-1 (if needed we use the pseudo-inverse)
Compute dW* (and dBias* if needed) and update the optimal_delta_layer attribute. L(A + gamma * B * dW) = L(A) - gamma * d + o(gamma) where d is the first order decrease and gamma the scaling factor.
- compute_optimal_updates(numerical_threshold: float = 1e-10, statistical_threshold: float = 1e-05, maximum_added_neurons: int | None = None, update_previous: bool = True, zero_delta: bool = False, dtype: dtype = torch.float32) tuple[Tensor, Tensor | None] [source]#
Compute the optimal update and additional neurons.
- Parameters:
numerical_threshold (float) – threshold to consider an eigenvalue as zero in the square root of the inverse of S
statistical_threshold (float) – threshold to consider an eigenvalue as zero in the SVD of S{-1/2} N
maximum_added_neurons (int | None) – maximum number of added neurons, if None all significant neurons are kept
update_previous (bool) – whether to change the previous layer extended_output_layer
zero_delta (bool) – if True, compute the optimal added neurons without performing the natural gradient step.
dtype (torch.dtype) – dtype for the computation of the optimal delta and added parameters
- Returns:
optimal extension for the previous layer (weights and biases)
- Return type:
tuple[torch.Tensor, torch.Tensor | None]
- compute_s_growth_update() tuple[Tensor, int] [source]#
Compute the update of the tensor S_growth.
- Returns:
torch.Tensor – update of the tensor S_growth
int – number of samples used to compute the update
- compute_s_update() tuple[Tensor, int] [source]#
Compute the update of the tensor S. Should be added to the type of layer.
- Returns:
torch.Tensor – update of the tensor S
int – number of samples used to compute the update
- delete_update(include_previous: bool = True, include_output: bool = False) None [source]#
Delete the updates of the layer: - optimal_delta_layer - extended_input_layer and associated extensions
By default, we do not delete the extended_output_layer of this layer because it could be required by the next layer.
- extended_forward(x: Tensor, x_ext: Tensor | None = None) tuple[Tensor, Tensor | None] [source]#
Forward pass of the module with layer extension and layer update scaled according to the scaling factor. WARNING: does not store the input and pre-activity tensors. WARNING: the scaling factor is squared for the optimal delta and linear for the extension. (Instead of linear for the optimal delta and squared for the extension as in the theory).
- Parameters:
x (torch.Tensor) – input tensor
x_ext (torch.Tensor | None) – extension tensor
- Returns:
output tensor and extension tensor
- Return type:
tuple[torch.Tensor, torch.Tensor]
- property first_order_improvement: Tensor#
Get the first order improvement of the block.
- Returns:
first order improvement
- Return type:
torch.Tensor
- forward(x)[source]#
Forward pass of the module. If needed, store the activity and pre-activity tensors.
- Parameters:
x (torch.Tensor) – input tensor
- Returns:
output tensor
- Return type:
torch.Tensor
- property input_extended: Tensor#
Return the input extended ones if the bias is used.
- Returns:
input extended
- Return type:
torch.Tensor
- layer_in_extension(weight: Tensor) None [source]#
Extend the layer with the parameters of layer assuming that the input of the layer is extended but not the output.
- Parameters:
weight (torch.Tensor) – weight of the extension
- layer_of_tensor(weight: Tensor, bias: Tensor | None = None) Linear [source]#
- Create a layer with the same characteristics (excepted the shape)
with weight as parameter and bias as bias.
- Parameters:
weight (torch.Tensor) – weight of the layer
bias (torch.Tensor | None) – bias of the layer
- Returns:
layer with the same characteristics
- Return type:
torch.nn.Linear
- layer_out_extension(weight: Tensor, bias: Tensor | None = None) None [source]#
Extend the layer with the parameters of layer assuming that the output of the layer is extended but not the input.
- Parameters:
weight (torch.Tensor) – weight of the extension
bias (torch.Tensor | None) – bias of the extension if needed
- number_of_parameters() int [source]#
Return the number of parameters of the layer.
- Returns:
number of parameters
- Return type:
- parameter_step(delta_weights: Tensor, delta_biases: Tensor | None = None) None [source]#
Update the parameters of the layer with the given deltas.
- Parameters:
delta_weights (torch.Tensor) – delta values for the weights
delta_biases (torch.Tensor | None) – delta values for the biases, if None, the biases are not updated
- parameters(recurse: bool = True) Iterator[Parameter] [source]#
Return the parameters of the layer.
- Parameters:
recurse (bool) – if True, return the parameters of the submodules
- Returns:
iterator over the parameters of the layer
- Return type:
Iterator[Parameter]
- projected_v_goal(input_vector: Tensor) Tensor [source]#
Compute the projected gradient of the goal with respect to the activity of the layer.
dLoss/dA_proj := dLoss/dA - dW B[-1] where A is the pre-activation vector of the layer, and dW the optimal delta for the layer
- Parameters:
input_vector (torch.Tensor of shape (n_samples, in_features)) – input vector B[-1]
- Returns:
projected gradient of the goal with respect to the activity of the next layer dLoss/dA - dW B[-1]
- Return type:
torch.Tensor
- sub_select_optimal_added_parameters(keep_neurons: int, sub_select_previous: bool = True) None [source]#
Select the first keep_neurons neurons of the optimal added parameters linked to this layer.
- property tensor_n: Tensor#
Compute the tensor N for the layer with the current M_{-2}, C and optimal delta.
- Returns:
N
- Return type:
torch.Tensor
- property tensor_s: TensorStatistic#
Return the tensor S of the layer. Either the tensor S computed locally or the tensor S of the previous merge layer.
- Returns:
tensor S
- Return type:
TensorStatistic