gromo.linear_growing_module.LinearGrowingModule#

class gromo.linear_growing_module.LinearGrowingModule(in_features: int, out_features: int, use_bias: bool = True, post_layer_function: Module = Identity(), previous_module: GrowingModule | AdditionGrowingModule | None = None, next_module: GrowingModule | AdditionGrowingModule | None = None, allow_growing: bool = False, device: device | None = None, name: str | None = None)[source]#
property activation_gradient: Tensor#

Return the derivative of the activation function before this layer at 0+.

Returns:

derivative of the activation function before this layer at 0+

Return type:

torch.Tensor

add_parameters(matrix_extension: Tensor | None, bias_extension: Tensor | None, added_in_features: int = 0, added_out_features: int = 0) None[source]#

Add new parameters to the layer.

Parameters:
  • matrix_extension (torch.Tensor) –

    extension of the weight matrix of the layer if None, the layer is extended with zeros should be of shape:

    • (out_features, in_features + added_in_features) if added_in_features > 0

    • (out_features + added_out_features, in_features) if added_out_features > 0

  • bias_extension (torch.Tensor of shape (out_features + added_out_features,)) – extension of the bias vector of the layer if None, the layer is extended with zeros

  • added_in_features (int >= 0) – number of input features added if None, the number of input features is not changed

  • added_out_features (int >= 0) – number of output features added if None, the number of output features is not changed

Raises:

AssertionError – if we try to add input and output features at the same time

compute_cross_covariance_update() tuple[Tensor, int][source]#

Compute the update of the tensor P := B[-2]^T B[-1] .

Returns:

  • torch.Tensor – update of the tensor P

  • int – number of samples used to compute the update

compute_m_prev_update(desired_activation: Tensor | None = None) tuple[Tensor, int][source]#

Compute the update of the tensor M_{-2} := B[-2]^T dA .

Parameters:

desired_activation (torch.Tensor | None) – desired variation direction of the output of the layer

Returns:

  • torch.Tensor – update of the tensor M_{-2}

  • int – number of samples used to compute the update

compute_m_update(desired_activation: Tensor | None = None) tuple[Tensor, int][source]#

Compute the update of the tensor M. With the input tensor X and dLoss/dA the gradient of the loss with respect to the pre-activity: M = B[-1]^T dA

Parameters:

desired_activation (torch.Tensor | None) – desired variation direction of the output of the layer

Returns:

  • torch.Tensor – update of the tensor M

  • int – number of samples used to compute the update

compute_n_update() tuple[Tensor, int][source]#

Compute the update of the tensor N. With the input tensor X and V[+1] the projected desired update at the next layer (V[+1] = dL/dA[+1] - dW[+1]* B), the update is U^{j k} = X^{i j} V[+1]^{i k}.

Returns:

  • torch.Tensor – update of the tensor N

  • int – number of samples used to compute the update

compute_optimal_added_parameters(numerical_threshold: float = 1e-15, statistical_threshold: float = 0.001, maximum_added_neurons: int | None = None, update_previous: bool = True, dtype: dtype = torch.float32) tuple[Tensor, Tensor | None, Tensor, Tensor][source]#

Compute the optimal added parameters to extend the input layer.

Parameters:
  • numerical_threshold (float) – threshold to consider an eigenvalue as zero in the square root of the inverse of S

  • statistical_threshold (float) – threshold to consider an eigenvalue as zero in the SVD of S{-1/2} N

  • maximum_added_neurons (int | None) – maximum number of added neurons, if None all significant neurons are kept

  • update_previous (bool) – whether to change the previous layer extended_output_layer

  • dtype (torch.dtype) – dtype for S and N during the computation

Returns:

optimal added weights alpha weights, alpha bias, omega and eigenvalues lambda

Return type:

tuple[torch.Tensor, torch.Tensor | None, torch.Tensor, torch.Tensor]

compute_optimal_delta(update: bool = True, dtype: dtype = torch.float32, force_pseudo_inverse: bool = False) tuple[Tensor, Tensor | None, Tensor | float][source]#

Compute the optimal delta for the layer using current S and M tensors.

dW* = M S[-1]^-1 (if needed we use the pseudo-inverse)

Compute dW* (and dBias* if needed) and update the optimal_delta_layer attribute. L(A + gamma * B * dW) = L(A) - gamma * d + o(gamma) where d is the first order decrease and gamma the scaling factor.

Parameters:
  • update (bool) – if True update the optimal delta layer attribute

  • dtype (torch.dtype) – dtype for S and M during the computation

Returns:

optimal delta for the weights, the biases if needed and the first order decrease

Return type:

tuple[torch.Tensor, torch.Tensor | None, torch.Tensor | float]

compute_s_update() tuple[Tensor, int][source]#

Compute the update of the tensor S. With the input tensor B, the update is U^{j k} = B^{i j} B^{i k}.

Returns:

  • torch.Tensor – update of the tensor S

  • int – number of samples used to compute the update

property input_extended: Tensor#

Return the input extended with a column of ones if the bias is used.

Returns:

input extended

Return type:

torch.Tensor

layer_in_extension(weight: Tensor) None[source]#

Extend the layer with the parameters of layer assuming that the input of the layer is extended but not the output.

Parameters:

weight (torch.Tensor (out_features, K)) – weight of the extension

layer_of_tensor(weight: Tensor, bias: Tensor | None = None) Linear[source]#
Create a layer with the same characteristics (excepted the shape)

with weight as parameter and bias as bias.

Parameters:
  • weight (torch.Tensor) – weight of the layer

  • bias (torch.Tensor | None) – bias of the layer

Returns:

layer with the same characteristics

Return type:

torch.nn.Linear

layer_out_extension(weight: Tensor, bias: Tensor | None = None) None[source]#

Extend the layer with the parameters of layer assuming that the output of the layer is extended but not the input.

Parameters:
  • weight (torch.Tensor (K, in_features)) – weight of the extension

  • bias (torch.Tensor (K) | None) – bias of the extension if needed

number_of_parameters() int[source]#

Return the number of parameters of the layer.

Returns:

number of parameters

Return type:

int

sub_select_optimal_added_parameters(keep_neurons: int, sub_select_previous: bool = True) None[source]#

Select the first keep_neurons neurons of the optimal added parameters.

Parameters:
  • keep_neurons (int) – number of neurons to keep

  • sub_select_previous (bool) – if True, sub-select the previous layer added parameters as well

property tensor_n: Tensor#

Compute the tensor N for the layer with the current M_-2, P and optimal delta.

Returns:

N

Return type:

torch.Tensor