gromo.modules.growing_module.GrowingModule#

class gromo.modules.growing_module.GrowingModule(layer: Module, tensor_s_shape: tuple[int, int] | None = None, tensor_m_shape: tuple[int, int] | None = None, post_layer_function: Module = Identity(), extended_post_layer_function: Module | None = None, allow_growing: bool = True, previous_module: Module | None = None, next_module: Module | None = None, device: device | None = None, name: str | None = None, target_in_neurons: int | None = None, initial_in_neurons: int | None = None)[source]#

Abstract class for a Module of dynamic size

Parameters:
  • layer (torch.nn.Module) – layer of the module

  • tensor_s_shape (tuple[int, int] | None) – shape of the tensor S

  • tensor_m_shape (tuple[int, int] | None) – shape of the tensor M

  • post_layer_function (torch.nn.Module, optional) – function to apply after the layer, by default torch.nn.Identity()

  • extended_post_layer_function (torch.nn.Module | None, optional) – extended function to apply after the layer, by default None

  • allow_growing (bool) – if True, the module can grow (require a previous GrowingModule)

  • previous_module (torch.nn.Module | None) – previous module

  • next_module (torch.nn.Module | None) – next module

  • device (torch.device | None) – device to use

  • name (str | None) – name of the module

  • target_in_neurons (int | None, optional) – target fan-in size, by default None

  • initial_in_neurons (int | None, optional) – initial fan-in size, by default None

property activation_gradient: Tensor#

Return the derivative of the activation function before this layer at 0+.

/!/ A caching mechanism is used to avoid recomputing the value multiple times. Therefore, if the previous module changes its post layer function, the cache must be cleared manually by setting _activation_gradient_previous_module to None.

Returns:

derivative of the activation function before this layer at 0+

Return type:

torch.Tensor

Raises:

NotImplementedError – abstract method

add_parameters(**kwargs: Any) None[source]#

Grow the module by adding new parameters to the layer.

Parameters:

**kwargs (Any) – typically include the values of the new parameters to add to the layer

Raises:

NotImplementedError – abstract method

apply_change(scaling_factor: float | Tensor | None = None, apply_previous: bool = True, apply_delta: bool = True, apply_extension: bool = True, extension_size: int | None = None) None[source]#

Apply the optimal delta and extend the layer with current optimal delta and layer extension with the current scaling factor. This means that the layer input is extended with the current layer output extension and the previous layer output is extended with the previous layer output extension both scaled by the current scaling factor. This also means that the layer output is not extended.

Parameters:
  • scaling_factor (float | torch.Tensor | None) –

    scaling factor to apply to the optimal delta,

    if None use the current scaling factor

  • apply_previous (bool) – if True apply the change to the previous layer, by default True

  • apply_delta (bool) – if True apply the optimal delta to the layer, by default True

  • apply_extension (bool) – if True apply the extension to the layer, by default True

  • extension_size (int | None) – size of the extension to apply, by default None and get automatically determined using self.eigenvalues_extension.shape[0]

Raises:
  • ValueError – if the layer has no extension but an extension_size above zero was requested

  • NotImplementedError – if the previous module is not of type GrowingModule

property bias: Tensor#

Get the bias of the layer

Returns:

bias tensor

Return type:

torch.Tensor

complete_growth(extension_kwargs: Any) None[source]#

Complete the growth to the target size.

Parameters:

extension_kwargs (Any) – Additional arguments for creating layer extensions.

compute_cross_covariance_update() tuple[Tensor, int][source]#

Compute the update of the tensor C := B[-1] B[-2]^T.

Returns:

  • torch.Tensor – update of the tensor C

  • int – number of samples used to compute the update

Raises:

NotImplementedError – abstract method

compute_m_prev_update(desired_activation: Tensor | None = None) tuple[Tensor, int][source]#

Compute the update of the tensor M_{-2} := dA B[-2]^T.

Parameters:

desired_activation (torch.Tensor | None) – desired variation direction of the output of the layer

Returns:

  • torch.Tensor – update of the tensor M_{-2}

  • int – number of samples used to compute the update

Raises:

NotImplementedError – abstract method

compute_m_update(desired_activation: Tensor | None = None) tuple[Tensor, int][source]#

Compute the update of the tensor M. Should be added to the type of layer.

Parameters:

desired_activation (torch.Tensor | None) – desired variation direction of the output of the layer

Returns:

  • torch.Tensor – update of the tensor M

  • int – number of samples used to compute the update

Raises:

NotImplementedError – abstract method

compute_n_update() tuple[Tensor, int][source]#

Compute the update of the tensor N. Should be added to the type of layer.

Returns:

  • torch.Tensor – update of the tensor N

  • int – number of samples used to compute the update

Raises:

NotImplementedError – abstract method

compute_optimal_delta(update: bool = True, dtype: dtype = torch.float32, force_pseudo_inverse: bool = False) tuple[Tensor, Tensor | None, Tensor | float][source]#

Compute the optimal delta for the layer using current S and M tensors.

dW* = M S[-1]^-1 (if needed we use the pseudo-inverse)

Compute dW* (and dBias* if needed) and update the optimal_delta_layer attribute. L(A + gamma * B * dW) = L(A) - gamma * d + o(gamma) where d is the first order decrease and gamma the scaling factor.

Parameters:
  • update (bool) – if True update the optimal delta layer attribute and the first order decrease

  • dtype (torch.dtype) – dtype for S and M during the computation

  • force_pseudo_inverse (bool) – if True, use the pseudo-inverse to compute the optimal delta even if the matrix is invertible

Returns:

optimal delta for the weights, the biases if needed and the first order decrease

Return type:

tuple[torch.Tensor, torch.Tensor | None, torch.Tensor | float]

compute_optimal_updates(numerical_threshold: float = 1e-06, statistical_threshold: float = 0.001, maximum_added_neurons: int | None = None, update_previous: bool = True, dtype: dtype = torch.float32, compute_delta: bool = True, use_covariance: bool = True, alpha_zero: bool = False, omega_zero: bool = False, use_projection: bool = True, ignore_singular_values: bool = False) tuple[Tensor | None, Tensor | None][source]#

Compute the optimal update and additional neurons.

This method computes optimal weight updates for growing neural networks by analyzing gradient statistics and covariance information.

compute_s_update() tuple[Tensor, int][source]#

Compute the update of the tensor S. Should be added to the type of layer.

Returns:

  • torch.Tensor – update of the tensor S

  • int – number of samples used to compute the update

Raises:

NotImplementedError – abstract method

copy_uniform_initialization(tensor: Tensor, reference_tensor: Tensor | None, fan_in: int) None[source]#

Initialize tensor with uniform law aligned on reference

Initialize the tensor with a uniform law with bounds -sqrt(std(W)), sqrt(std(W)) where std(W) is the empirical standard deviation of the reference_tensor if the reference_tensor has a non-zero variance. Otherwise, use bounds -sqrt(6 / fan_in), sqrt(6 / fan_in) where fan_in is the number of input features of the reference tensor + extension.

Parameters:
  • tensor (torch.Tensor) – tensor to initialize

  • reference_tensor (torch.Tensor | None) – tensor to get the standard deviation from or None to use Kaiming init

  • fan_in (int) – number of input features of the base tensor + extension

create_layer_extensions(extension_size: int, output_extension_size: int | None = None, input_extension_size: int | None = None, output_extension_init: str = 'copy_uniform', input_extension_init: str = 'copy_uniform') None[source]#

Create extension for layer input and output.

Create the layer input and output extensions of given sizes. Allow to have different sizes for input and output extensions, this is useful for example if you connect a convolutional layer to a linear layer.

Parameters:
  • extension_size (int) – size of the extension to create

  • output_extension_size (int | None) – size of the output extension to create, if None use extension_size

  • input_extension_size (int | None) – size of the input extension to create, if None use extension_size

  • output_extension_init (str) – Initialization method for the output extension. Must be one of the keys in known_inits (“copy_uniform”, “kaiming”, “zeros”), default “copy_uniform”.

  • input_extension_init (str) – Initialization method for the input extension. Must be one of the keys in known_inits (“copy_uniform”, “kaiming”, “zeros”), default “copy_uniform”.

Notes

Additional initialization methods can be added by registering them in the local known_inits dictionary of this method. Each initialization callable is applied to the extension weight tensor and to the extension bias tensor, if the layer has a bias.

The callable must accept the following arguments:

tensor: torch.Tensor

Tensor of the weight/bias extension, to initialize.

reference_tensor: torch.Tensor | None

Weight/bias tensor from the layer before extension.

fan_in: int

The fan_in of the layer, after including the extension.

An initialization callable may also modify the existing weights/biases, by mutating reference_tensor.

Raises:

ValueError – if unknown initialization method

create_layer_in_extension(extension_size: int) None[source]#

Create the layer input extension of given size.

Parameters:

extension_size (int) – size of the extension to create

Raises:

NotImplementedError – abstract method

create_layer_out_extension(extension_size: int) None[source]#

Create the layer output extension of given size.

Parameters:

extension_size (int) – size of the extension to create

Raises:

NotImplementedError – abstract method

delete_update(include_previous: bool = True, delete_delta: bool = True, delete_input: bool = True, delete_output: bool = False) None[source]#

Delete the updates of the layer: - optimal_delta_layer - extended_input_layer and associated extensions

By default, we do not delete the extended_output_layer of this layer because it could be required by the next layer.

Parameters:
  • include_previous (bool, optional) – delete the extended_output_layer of the previous layer, by default True

  • delete_delta (bool, optional) – delete the optimal_delta_layer of the module, by default True

  • delete_input (bool, optional) – delete the extended_input_layer of this module, by default True

  • delete_output (bool, optional) – delete the extended_output_layer of this layer, by default False warning: this does not delete the extended_input_layer of the next layer

Raises:
  • NotImplementedError – if include_previous is True and the previous module is of type MergeGrowingModule

  • TypeError – if previous module is not of type GrowingModule or MergeGrowingModule

extended_forward(x: Tensor, x_ext: Tensor | None = None, use_optimal_delta: bool = True, use_extended_input: bool = True, use_extended_output: bool = True) tuple[Tensor, Tensor | None][source]#

Forward pass of the module with layer extension and layer update scaled according to the scaling factor. WARNING: does not store the input and pre-activity tensors. WARNING: the scaling factor is squared for the optimal delta and linear for the extension. (Instead of linear for the optimal delta and root squared for the extension as in the theory).

Parameters:
  • x (torch.Tensor) – input tensor

  • x_ext (torch.Tensor | None) – extension tensor

  • use_optimal_delta (bool, optional) – if True, use the optimal delta layer, default True

  • use_extended_input (bool, optional) – if True, use the extended input layer, default True

  • use_extended_output (bool, optional) – if True, use the extended output layer, default True

Returns:

output tensor and extension tensor

Return type:

tuple[torch.Tensor, torch.Tensor | None]

Raises:

ValueError – if the input is extended and x_ext is not provided

property first_order_improvement: Tensor#

Get the first order improvement of the block.

Returns:

first order improvement

Return type:

torch.Tensor

forward(x: Tensor) Tensor[source]#

Forward pass of the module. If needed, store the activity and pre-activity tensors.

Parameters:

x (torch.Tensor) – input tensor

Returns:

output tensor

Return type:

torch.Tensor

static get_fan_in_from_layer(layer: Module) int[source]#

Get the fan_in (number of input features) from a given layer.

Parameters:

layer (torch.nn.Module) – layer to get the fan_in from

Returns:

fan_in of the layer

Return type:

int

Raises:

NotImplementedError – abstract method

property in_features: int#

Fan-in size

Returns:

fan-in size

Return type:

int

Raises:

NotImplementedError – abstract method

property in_neurons: int#

Number of input neurons

Returns:

number of input neurons

Return type:

int

Raises:

NotImplementedError – abstract method

init_computation() None[source]#

Initialize the computation of the optimal added parameters.

property input: Tensor#

Get the input of the layer

Returns:

input tensor

Return type:

torch.Tensor

Raises:

ValueError – if the input is not stored

property input_extended: Tensor#

Return the input extended ones if the bias is used.

Returns:

input extended

Return type:

torch.Tensor

Raises:

NotImplementedError – abstract method if bias is used

property input_size: tuple[int, ...]#

Get the expected shape of the input excluding batch size and channels

Returns:

input shape

Return type:

tuple[int, …]

Raises:

ValueError – if the input size is not given and cannot be calculated

property input_volume: int#

Expected input volume

Returns:

input volume

Return type:

int

Raises:

NotImplementedError – abstract method

kaiming_initialization(tensor: Tensor, reference_tensor: Tensor | None, fan_in: int) None[source]#

Initialize tensor with Kaiming.

Parameters:
  • tensor (torch.Tensor) – tensor to initialize

  • reference_tensor (torch.Tensor | None) – Unused

  • fan_in (int) – number of input features of the base tensor + extension

layer_in_extension(weight: Tensor) None[source]#

Extend the layer with the parameters of layer assuming that the input of the layer is extended but not the output.

Parameters:

weight (torch.Tensor) – weight of the extension

Raises:

NotImplementedError – abstract method

layer_of_tensor(weight: Tensor, bias: Tensor | None = None, force_bias: bool = True) Module[source]#
Create a layer with the same characteristics (excepted the shape)

with weight as parameter and bias as bias.

Parameters:
  • weight (torch.Tensor) – weight of the layer

  • bias (torch.Tensor | None) – bias of the layer

  • force_bias (bool) – if True, the created layer require a bias if self.use_bias is True

Returns:

layer with the same characteristics

Return type:

torch.nn.Module

Raises:

NotImplementedError – abstract method

layer_out_extension(weight: Tensor, bias: Tensor | None = None) None[source]#

Extend the layer with the parameters of layer assuming that the output of the layer is extended but not the input.

Parameters:
  • weight (torch.Tensor) – weight of the extension

  • bias (torch.Tensor | None) – bias of the extension if needed

Raises:

NotImplementedError – abstract method

missing_neurons() int[source]#

Get the number of missing neurons to reach the target hidden features.

Returns:

number of missing neurons

Return type:

int

Raises:

ValueError – if target_in_neurons are not set

normalize_optimal_updates(std_target: float | None = None, normalization_type: str = 'legacy_normalization') None[source]#

Normalize optimal update to target standard deviation

Normalize the optimal updates so that the standard deviation of the weights of the updates is equal to std_target. If std_target is None, we automatically determine it. We use the standard deviation of the weights of the layer if it has weights. If the layer has no weights, we aim to have a std of 1 / sqrt(in_features).

If normalization_type is “equalize_second_layer”: Let s be the target standard deviation then: - optimal_delta_layer is scaled to have a std of s (so by s / std(optimal_delta_layer)) - extended_input_layer is scaled to have a std of s (so by s / std(extended_input_layer)) - extended_output_layer is scaled to match the scaling of the extended_input_layer and the optimal_delta_layer (so by std(extended_input_layer) / std(optimal_delta_layer))

If normalization_type is “equalize_extensions”: Let s be the target standard deviation then: - extended_input_layer is scaled to have a std of s (so by s / std(extended_input_layer)) - extended_output_layer is scaled to have a std of s (so by s / std(extended_output_layer)) - optimal_delta_layer is scaled to match the scaling of the extended_input_layer and the extended_output_layer (so by s ** 2 / (std(extended_input_layer) * std(extended_output_layer)))

Parameters:
  • std_target (float | None) – target standard deviation for the weights of the updates

  • normalization_type (str) – type of normalization to use, one of ‘equalize_second_layer’, ‘equalize_extensions’, ‘weird_normalization’

Raises:

ValueError – if there is no previous module or the normalization_type is invalid

number_of_neurons_to_add(method: str = 'fixed_proportional', number_of_growth_steps: int = 1) int[source]#

Get the number of neurons to add in the next growth step.

- fixed_proportional: add a fixed proportion of the total number of neurons

to add at each growth step. The amount to add is computed as an integer division as a consequence a few neurons may remain to be added after all growth steps have been performed.

Parameters:
  • method (str) – Method to use for determining the number of neurons to add. Options are “fixed_proportional”.

  • number_of_growth_steps (int) – Number of growth steps planned, used only if method is “fixed_proportional”.

Returns:

Number of neurons to add.

Return type:

int

Raises:

ValueError – if target_in_neurons or initial_in_neurons are not set or the method is unknown

number_of_parameters() int[source]#

Return the number of parameters of the layer.

Returns:

number of parameters

Return type:

int

property out_features: int#

Fan-out size

Returns:

fan-out size

Return type:

int

Raises:

NotImplementedError – abstract method

property output_volume: int#

Expected output volume

Returns:

output volume

Return type:

int

Raises:

NotImplementedError – abstract method

parameter_step(delta_weights: Tensor, delta_biases: Tensor | None = None) None[source]#

Update the parameters of the layer with the given deltas.

Parameters:
  • delta_weights (torch.Tensor) – delta values for the weights

  • delta_biases (torch.Tensor | None) – delta values for the biases, if None, the biases are not updated

parameters(recurse: bool = True) Iterator[Parameter][source]#

Return the parameters of the layer.

Parameters:

recurse (bool) – if True, return the parameters of the submodules

Returns:

iterator over the parameters of the layer

Return type:

Iterator[torch.nn.Parameter]

property pre_activity: Tensor#

Get the pre activity of the layer

Returns:

pre activity tensor

Return type:

torch.Tensor

Raises:

ValueError – if the pre activity is not stored

projected_v_goal(input_vector: Tensor) Tensor[source]#

Compute the projected gradient of the goal with respect to the activity of the layer.

dLoss/dA_proj := dLoss/dA - dW B[-1] where A is the pre-activation vector of the layer, and dW the optimal delta for the layer

Parameters:

input_vector (torch.Tensor) – input vector B[-1] of shape (n_samples, in_features)

Returns:

projected gradient of the goal with respect to the activity of the next layer dLoss/dA - dW B[-1]

Return type:

torch.Tensor

reset_computation() None[source]#

Reset the computation of the optimal added parameters.

static scale_layer(layer: Module, scale: float) Module[source]#

Scale the weights and biases of a given layer by a specified factor.

Parameters:
  • layer (torch.nn.Module) – The layer whose parameters are to be scaled.

  • scale (float) – The factor by which to scale the layer’s parameters.

Returns:

The layer with scaled parameters.

Return type:

torch.nn.Module

scale_layer_extension(scale: float | None, scale_output: float | None, scale_input: float | None) None[source]#

Scale the layer extension by a given factor. This means scaling the extended_input_layer, the extended_output_layer and the eigenvalues_extension. However as the eigenvalues_extension will be squared they will be scaled by sqrt(scale_input * scale_output).

Parameters:
  • scale (float | None) – The factor by which to scale the layer extension. If not None, replace both scale_input and scale_output if they are not None.

  • scale_output (float | None) – The factor by which to scale the layer output extension.

  • scale_input (float | None) – The factor by which to scale the layer input extension. If not None, scale must be None.

Raises:

ValueError – Cannot scale layer extension if one of the extensions is None

scale_parameter_update(scale: float) None[source]#

Scale the parameter update by a given factor. This means scaling the optimal delta and the parameter_update_decrease.

Parameters:

scale (float) – The factor by which to scale the parameter update.

set_scaling_factor(factor: float) None[source]#

Assign scaling factor to all growing layers

Parameters:

factor (float) – scaling factor

sub_select_optimal_added_parameters(keep_neurons: int | None = None, threshold: float | None = None, sub_select_previous: bool = True, zeros_if_not_enough: bool = False, zeros_fan_in: bool = True, zeros_fan_out: bool = False) None[source]#

Select the first keep_neurons neurons of the optimal added parameters linked to this layer.

Parameters:
  • keep_neurons (int | None) – number of neurons to keep, if None, the number of neurons is determined by the threshold

  • threshold (float | None) – threshold to determine the number of neurons to keep, if None, keep_neurons must be provided

  • sub_select_previous (bool) – if True, sub-select the previous layer added parameters as well

  • zeros_if_not_enough (bool) – if True, will keep the all neurons and set the non selected ones to zero (either first or last depending on zeros_fan_in and zeros_fan_out)

  • zeros_fan_in (bool) – if True and zeros_if_not_enough is True, will set the non selected fan-in parameters to zero

  • zeros_fan_out (bool) – if True and zeros_if_not_enough is True, will set the non selected fan-out parameters to zero

Raises:
property tensor_n: Tensor#

Compute the tensor N for the layer with the current M_{-2}, C and optimal delta.

Returns:

N

Return type:

torch.Tensor

Raises:

NotImplementedError – abstract method

property tensor_s: TensorStatistic#

Return the tensor S of the layer. Either the tensor S computed locally or the tensor S of the previous merge layer.

Returns:

tensor S

Return type:

TensorStatistic

property tensor_s_growth#

Redirect to the tensor S of the previous module.

update_computation() None[source]#

Update the computation of the optimal added parameters.

update_input_size(input_size: tuple[int, ...] | None = None, compute_from_previous: bool = False, force_update: bool = True) tuple[int, ...] | None[source]#

Update the input size of the layer. Either according to the parameter or the input currently stored.

Parameters:
  • input_size (tuple[int, ...] | None) – new input size

  • compute_from_previous (bool) – whether to compute the input size from the previous module assuming its output size won’t be affected by the post-layer function

  • force_update (bool) – whether to force the update even if the input size is already set (_input_size is not None)

Returns:

updated input size if it could be computed, None otherwise

Return type:

tuple[int, …] | None

Raises:

NotImplementedError – abstract method

property weight: Tensor#

Get the weight of the layer

Returns:

weight tensor

Return type:

torch.Tensor

weights_statistics() dict[str, dict[str, float]][source]#

Get the statistics of the weights in the growing layer.

Returns:

A dictionary where keys are weights names and values are dictionaries of weight statistics.

Return type:

dict[str, dict[str, float]]

Examples using gromo.modules.growing_module.GrowingModule#

GroMo tutorial

GroMo tutorial