fusionlab.nn.components.ExplainableAttention

class fusionlab.nn.components.ExplainableAttention[source]

Bases: Layer, NNLearner

ExplainableAttention layer that returns attention scores from multi-head attention [1].

This layer is useful for interpretability, providing insight into how the attention mechanism focuses on different time steps.

\[\mathbf{A} = \text{MHA}(\mathbf{X},\,\mathbf{X}) \rightarrow \text{attention\_scores}\]

Here, \(\mathbf{X}\) is an input tensor, and attention_scores is the matrix capturing attention weights.

Parameters:
  • num_heads (int) – Number of heads for multi-head attention.

  • key_dim (int) – Dimensionality of the query/key projections.

Notes

Unlike standard layers that return the transformation output, this layer specifically returns the attention score matrix for interpretability.

call(`inputs`, training=False)[source]

Forward pass that outputs only the attention scores.

get_config()[source]

Returns the configuration for serialization.

from_config(`config`)[source]

Creates a new instance from the given config.

Examples

>>> from fusionlab.nn.components import ExplainableAttention
>>> import tensorflow as tf
>>> # Suppose we have input of shape (batch_size, time_steps, features)
>>> x = tf.random.normal((32, 10, 64))
>>> # Instantiate explainable attention
>>> ea = ExplainableAttention(num_heads=4, key_dim=64)
>>> # Forward pass returns attention scores: (B, num_heads, T, T)
>>> scores = ea(x)

See also

CrossAttention

Another attention variant for cross-sequence contexts.

MultiResolutionAttentionFusion

For fusing features via multi-head attention.

References

__init__(num_heads, key_dim)[source]

Initialize the ExplainableAttention layer.

Parameters:
  • num_heads (int) – Number of attention heads.

  • key_dim (int) – Dimensionality of query/key projections in multi-head attention.

Methods

__init__(num_heads, key_dim)

Initialize the ExplainableAttention layer.

add_loss(loss)

Can be called inside of the call() method to add a scalar loss.

add_metric(*args, **kwargs)

add_variable(shape, initializer[, dtype, ...])

Add a weight variable to the layer.

add_weight([shape, initializer, dtype, ...])

Add a weight variable to the layer.

build(input_shape)

build_from_config(config)

Builds the layer's states with the supplied config dict.

call(inputs[, training])

Forward pass that returns only the attention scores.

compute_mask(inputs, previous_mask)

compute_output_shape(*args, **kwargs)

compute_output_spec(*args, **kwargs)

count_params()

Count the total number of scalars composing the weights.

from_config(config)

Creates a new instance from the config dictionary.

get_build_config()

Returns a dictionary with the layer's input shape.

get_config()

Returns the layer configuration.

get_params([deep])

Get the parameters for this learner.

get_weights()

Return the values of layer.weights as a list of NumPy arrays.

help(**kwargs)

load(file_path[, format])

Load the learner's state from a specified file in the desired format.

load_own_variables(store)

Loads the state of the layer.

quantize(mode[, type_check, config])

quantized_build(input_shape, mode)

quantized_call(*args, **kwargs)

rematerialized_call(layer_call, *args, **kwargs)

Enable rematerialization dynamically for layer's call method.

save([file_path, format, overwrite, ...])

Save the learner's state to a specified file in the desired format.

save_own_variables(store)

Saves the state of the layer.

set_params(**params)

Set the parameters of this learner.

set_weights(weights)

Sets the values of layer.weights from a list of NumPy arrays.

stateless_call(trainable_variables, ...[, ...])

Call the layer without any side effects.

summary()

Provide a summary of the learner's parameters.

symbolic_call(*args, **kwargs)

Attributes

compute_dtype

The dtype of the computations performed by the layer.

dtype

Alias of layer.variable_dtype.

dtype_policy

input

Retrieves the input tensor(s) of a symbolic operation.

input_dtype

The dtype layer inputs should be converted to.

input_spec

losses

List of scalar losses from add_loss, regularizers and sublayers.

metrics

List of all metrics.

metrics_variables

List of all metric variables.

my_params

non_trainable_variables

List of all non-trainable layer state.

non_trainable_weights

List of all non-trainable weight variables of the layer.

output

Retrieves the output tensor(s) of a layer.

path

The path of the layer.

quantization_mode

The quantization mode of this layer, None if not quantized.

supports_masking

Whether this layer supports computing a mask using compute_mask.

trainable

Settable boolean, whether this layer should be trainable or not.

trainable_variables

List of all trainable layer state.

trainable_weights

List of all trainable weight variables of the layer.

variable_dtype

The dtype of the state (weights) of the layer.

variables

List of all layer state, including random seeds.

weights

List of all weight variables of the layer.

__init__(num_heads, key_dim)[source]

Initialize the ExplainableAttention layer.

Parameters:
  • num_heads (int) – Number of attention heads.

  • key_dim (int) – Dimensionality of query/key projections in multi-head attention.

call(inputs, training=False)[source]

Forward pass that returns only the attention scores.

Parameters:
  • inputs (tf.Tensor) – Tensor of shape (B, T, D).

  • training (bool, optional) – Indicates training mode; not used in this layer. Defaults to False.

Returns:

Attention scores of shape (B, num_heads, T, T).

Return type:

tf.Tensor

get_config()[source]

Returns the layer configuration.

Returns:

Dictionary containing ‘num_heads’ and ‘key_dim’.

Return type:

dict

classmethod from_config(config)[source]

Creates a new instance from the config dictionary.

Parameters:

config (dict) – Configuration dictionary.

Returns:

A new instance of this layer.

Return type:

ExplainableAttention

help(**kwargs)
my_params = ExplainableAttention(num_heads, key_dim)