gromo.growing_module.GrowingModule#
- class gromo.growing_module.GrowingModule(layer: Module, tensor_s_shape: tuple[int, int], tensor_m_shape: tuple[int, int], post_layer_function: Module = Identity(), allow_growing: bool = True, previous_module: Module | None = None, next_module: Module | None = None, device: device | None = None, name: str | None = None)[source]#
- property activation_gradient: Tensor#
Return the derivative of the activation function before this layer at 0+.
- Returns:
derivative of the activation function before this layer at 0+
- Return type:
torch.Tensor
- add_parameters(**kwargs) None [source]#
Grow the module by adding new parameters to the layer.
- Parameters:
kwargs (dict) – typically include the values of the new parameters to add to the layer
- apply_change(scaling_factor: float | Tensor | None = None, apply_previous: bool = True) None [source]#
Apply the optimal delta and extend the layer with current optimal delta and layer extension with the current scaling factor.
- compute_cross_covariance_update() tuple[Tensor, int] [source]#
Compute the update of the tensor C := B[-1] B[-2]^T.
- Returns:
torch.Tensor – update of the tensor C
int – number of samples used to compute the update
- compute_m_prev_update(desired_activation: Tensor | None = None) tuple[Tensor, int] [source]#
Compute the update of the tensor M_{-2} := dA B[-2]^T.
- Parameters:
desired_activation (torch.Tensor | None) – desired variation direction of the output of the layer
- Returns:
torch.Tensor – update of the tensor M_{-2}
int – number of samples used to compute the update
- compute_m_update(desired_activation: Tensor | None = None) tuple[Tensor, int] [source]#
Compute the update of the tensor M. Should be added to the type of layer.
- Parameters:
desired_activation (torch.Tensor | None) – desired variation direction of the output of the layer
- Returns:
torch.Tensor – update of the tensor M
int – number of samples used to compute the update
- compute_n_update()[source]#
Compute the update of the tensor N. Should be added to the type of layer.
- Returns:
update of the tensor N
- Return type:
torch.Tensor
- compute_optimal_added_parameters(numerical_threshold: float = 1e-15, statistical_threshold: float = 0.001, maximum_added_neurons: int | None = None, update_previous: bool = True, dtype: dtype = torch.float32) tuple[Tensor, Tensor | None, Tensor, Tensor] [source]#
Compute the optimal added parameters to extend the input layer. Update the extended_input_layer and the eigenvalues_extension.
- Parameters:
numerical_threshold (float) – threshold to consider an eigenvalue as zero in the square root of the inverse of S
statistical_threshold (float) – threshold to consider an eigenvalue as zero in the SVD of S{-1/2} N
maximum_added_neurons (int | None) – maximum number of added neurons, if None all significant neurons are kept
update_previous (bool) – whether to change the previous layer extended_output_layer
dtype (torch.dtype) – dtype for S and N during the computation
- Returns:
optimal added weights alpha weights, alpha bias, omega and eigenvalues lambda
- Return type:
tuple[torch.Tensor, torch.Tensor | None, torch.Tensor, torch.Tensor]
- compute_optimal_delta(update: bool = True, dtype: dtype = torch.float32) tuple[Tensor, Tensor | None, Tensor | float] [source]#
Compute the optimal delta for the layer using current S and M tensors.
dW* = M S[-1]^-1 (if needed we use the pseudo-inverse)
Compute dW* (and dBias* if needed) and update the optimal_delta_layer attribute. L(A + gamma * B * dW) = L(A) - gamma * d + o(gamma) where d is the first order decrease and gamma the scaling factor.
- Parameters:
update (bool) – if True update the optimal delta layer attribute and the first order decrease
dtype (torch.dtype) – dtype for S and M during the computation
- Returns:
optimal delta for the weights, the biases if needed and the first order decrease
- Return type:
tuple[torch.Tensor, torch.Tensor | None, torch.Tensor | float]
- compute_optimal_updates(numerical_threshold: float = 1e-10, statistical_threshold: float = 1e-05, maximum_added_neurons: int | None = None, update_previous: bool = True, zero_delta: bool = False, dtype: dtype = torch.float32) tuple[Tensor, Tensor | None] [source]#
Compute the optimal update and additional neurons.
- Parameters:
numerical_threshold (float) – threshold to consider an eigenvalue as zero in the square root of the inverse of S
statistical_threshold (float) – threshold to consider an eigenvalue as zero in the SVD of S{-1/2} N
maximum_added_neurons (int | None) – maximum number of added neurons, if None all significant neurons are kept
update_previous (bool) – whether to change the previous layer extended_output_layer
zero_delta (bool) – if True, set the optimal delta to zero
dtype (torch.dtype) – dtype for the computation of the optimal delta and added parameters
- Returns:
optimal extension for the previous layer (weights and biases)
- Return type:
tuple[torch.Tensor, torch.Tensor | None]
- compute_s_update() tuple[Tensor, int] [source]#
Compute the update of the tensor S. Should be added to the type of layer.
- Returns:
torch.Tensor – update of the tensor S
int – number of samples used to compute the update
- delete_update(include_previous: bool = True, include_output: bool = False) None [source]#
Delete the updates of the layer: - optimal_delta_layer - extended_input_layer and associated extensions
By default, we do not delete the extended_output_layer of this layer because it could be required by the next layer.
- extended_forward(x: Tensor, x_ext: Tensor | None = None) tuple[Tensor, Tensor | None] [source]#
Forward pass of the module with layer extension and layer update. WARNING: does not store the input and pre-activity tensors. WARNING: the scaling factor is squared for the optimal delta and linear for the extension. (Instead of linear for the optimal delta and squared for the extension as in the theory).
- Parameters:
x (torch.Tensor) – input tensor
x_ext (torch.Tensor | None) – extension tensor
- Returns:
output tensor and extension tensor
- Return type:
tuple[torch.Tensor, torch.Tensor]
- property first_order_improvement: Tensor#
Get the first order improvement of the block.
- Returns:
first order improvement
- Return type:
torch.Tensor
- forward(x)[source]#
Forward pass of the module. If needed, store the activity and pre-activity tensors.
- Parameters:
x (torch.Tensor) – input tensor
- Returns:
output tensor
- Return type:
torch.Tensor
- property input_extended: Tensor#
Return the input extended ones if the bias is used.
- Returns:
input extended
- Return type:
torch.Tensor
- layer_in_extension(weight: Tensor) None [source]#
Extend the layer with the parameters of layer assuming that the input of the layer is extended but not the output.
- Parameters:
weight (torch.Tensor) – weight of the extension
- layer_of_tensor(weight: Tensor, bias: Tensor | None = None) Linear [source]#
- Create a layer with the same characteristics (excepted the shape)
with weight as parameter and bias as bias.
- Parameters:
weight (torch.Tensor) – weight of the layer
bias (torch.Tensor | None) – bias of the layer
- Returns:
layer with the same characteristics
- Return type:
torch.nn.Linear
- layer_out_extension(weight: Tensor, bias: Tensor | None = None) None [source]#
Extend the layer with the parameters of layer assuming that the output of the layer is extended but not the input.
- Parameters:
weight (torch.Tensor) – weight of the extension
bias (torch.Tensor | None) – bias of the extension if needed
- number_of_parameters() int [source]#
Return the number of parameters of the layer.
- Returns:
number of parameters
- Return type:
- parameter_step(delta_weights: Tensor, delta_biases: Tensor | None = None) None [source]#
Update the parameters of the layer with the given deltas.
- Parameters:
delta_weights (torch.Tensor) – delta values for the weights
delta_biases (torch.Tensor | None) – delta values for the biases, if None, the biases are not updated
- parameters(recurse: bool = True) Iterator[Parameter] [source]#
Return the parameters of the layer.
- Parameters:
recurse (bool) – if True, return the parameters of the submodules
- Returns:
iterator over the parameters of the layer
- Return type:
Iterator[Parameter]
- projected_v_goal(input_vector: Tensor) Tensor [source]#
Compute the projected gradient of the goal with respect to the activity of the layer.
dLoss/dA_proj := dLoss/dA - dW B[-1] where A is the pre-activation vector of the layer, and dW the optimal delta for the layer
- Parameters:
input_vector (torch.Tensor of shape (n_samples, in_features)) – input vector B[-1]
- Returns:
projected gradient of the goal with respect to the activity of the next layer dLoss/dA - dW B[-1]
- Return type:
torch.Tensor
- sub_select_optimal_added_parameters(keep_neurons: int, sub_select_previous: bool = True) None [source]#
Select the first keep_neurons neurons of the optimal added parameters.
- property tensor_n: Tensor#
Compute the tensor N for the layer with the current M_-2, C and optimal delta.
- Returns:
N
- Return type:
torch.Tensor
- property tensor_s: TensorStatistic#
Return the tensor S of the layer. Either the tensor S computed locally or the tensor S of the previous addition layer.
- Returns:
tensor S
- Return type:
TensorStatistic