fusionlab.nn.models.BaseAttentive

class fusionlab.nn.models.BaseAttentive[source]

Bases: Model, NNLearner

Base Attentive Model.

A foundational blueprint for building powerful, data-driven, sequence-to-sequence time series forecasting models.

This class provides a sophisticated and highly configurable encoder-decoder architecture. It is designed to process three distinct types of inputs—static, dynamic past, and known future features—and fuse them using a modular stack of attention mechanisms. It serves as the core engine for models like HALNet and PIHALNet.

A data-driven model architecture that can be used for both hybrid and transformer-based forecasting models. This model processes static, dynamic, and future input features through separate paths and applies multi-head attention mechanisms in the decoder block to produce forecasts. The model supports multi-horizon forecasting, uncertainty quantification using quantiles, and dynamic time warping (DTW) for time-series alignment.

The model offers flexibility through various options for configuration, residual connections, and feature selection mechanisms, making it suitable for both statistical and physics-informed settings.

The architecture can be configured to operate as a hybrid model, combining the temporal feature extraction power of LSTMs with attention, or as a pure transformer model.

See more in User Guide.

Parameters:
  • static_input_dim (int) – Dimensionality of the static (time-invariant) input features. These are features that do not change over time for a given sample, such as a sensor’s location ID, soil type, or a product category. If 0, no static features are used.

  • dynamic_input_dim (int) – Dimensionality of the dynamic (time-varying) input features that are known in the past (the “lookback” window). This is a required parameter and typically includes the target variable itself (lagged) and other historical drivers like rainfall, temperature, or sales figures.

  • future_input_dim (int) – Dimensionality of the time-varying features for which values are known in advance for the forecast period. Examples include holidays, scheduled promotions, or day-of-week indicators. If 0, no future features are used.

  • output_dim (int, default 1) –

    Number of target variables produced at each forecast step. The model outputs a tensor of shape \((B, \, H, \, Q, \, \text{output\_dim})\) when quantiles are provided, or \((B, \, H, \, \text{output\_dim})\) for point forecasts, where

    \[B = \text{batch size},\qquad H = \text{forecast horizon},\qquad Q = |\text{quantiles}|.\]

  • forecast_horizon (int, default 1) – Length of the prediction window into the future. The dynamic encoder ingests max_window_size past steps and the decoder emits \(H\) steps ahead, where \(H=\text{forecast_horizon}\). Setting \(H > 1\) enables multi‑horizon sequence‑to‑sequence forecasts.

  • mode ({'pihal_like', 'tft_like'}, default 'tft_like') – Controls how future_features are sliced and routed. 'pihal_like' expects future_input.shape[1] == forecast_horizon and feeds the tensor only to the decoder. 'tft_like' expects time_steps + forecast_horizon rows, sending the first time_steps rows to the encoder and the remaining rows to the decoder, emulating the Temporal Fusion Transformer.

  • num_encoder_layers (int, default 2) – The number of self-attention blocks to stack in the encoder when using the ‘transformer’ architecture.

  • quantiles (list[float] or None, default None) –

    Optional quantile levels \(0 < q_1 < \dots < q_Q < 1\). When supplied, a fusionlab.nn.components.QuantileDistributionModeling head scales the point forecast \(\hat{y}\) into quantile estimates

    \[\hat{y}^{(q)} = \hat{y} + \sigma \,\Phi^{-1}(q),\]

    where \(\sigma\) is a learned spread parameter and \(\Phi^{-1}\) is the probit function. Omit or set to None to obtain deterministic forecasts.

  • embed_dim (int, default 32) – The base dimensionality for the internal feature space of the model. Various input features (static, dynamic, future) are projected into this common dimension to allow for meaningful interactions within downstream layers like LSTMs and attention mechanisms. It’s a key parameter for controlling model capacity.

  • hidden_units (int, default 64) – The number of units in the hidden layers of the Gated Residual Networks (GRNs). GRNs are core components used for non-linear transformations throughout the architecture. A larger value increases the model’s capacity to learn complex patterns.

  • lstm_units (int, default 64) – The number of hidden units in each LSTM layer within the MultiScaleLSTM block. This parameter determines the memory capacity of the recurrent cells processing the historical sequence data.

  • attention_units (int, default 32) – The dimensionality of the output space for the various attention mechanisms (e.g., CrossAttention, HierarchicalAttention). This is also often referred to as the model’s dimension, \(d_{model}\). It must be divisible by num_heads.

  • num_heads (int, default 4) – The number of attention heads in each MultiHeadAttention sub-layer. Using multiple heads allows the model to jointly attend to information from different representation subspaces at different positions, which can improve learning.

  • dropout_rate (float, default 0.1) – The dropout rate applied within various components like Gated Residual Networks (GRNs) and after some attention layers to prevent overfitting. It must be a float between 0.0 and 1.0.

  • max_window_size (int, default 10) – The number of past time steps (the lookback window) that the model considers. This should directly correspond to the time_steps parameter used during data preparation and is used by components like DynamicTimeWindow.

  • memory_size (int, default 100) – The number of memory slots in the MemoryAugmentedAttention layer. This external memory allows the model to learn and access patterns over very long-range dependencies that might be missed by standard LSTMs or attention.

  • scales (list of int, optional) – A list of scale factors for the MultiScaleLSTM. Each scale s creates an LSTM that processes the input sequence by taking every s-th time step. For example, scales=[1, 3] would process the sequence at its original resolution and at a coarser, every-third-timestep resolution. If None or ‘auto’, defaults to [1].

  • multi_scale_agg ({'last', 'average', 'concat', ...}, default 'last') –

    The strategy used by the aggregation function to combine the outputs from the different LSTMs in MultiScaleLSTM. - 'concat': (For 3D output) Pads sequences from different

    scales to the same length and concatenates them along the feature axis. This is the primary mode for creating a rich sequence representation for downstream attention layers in an encoder-decoder setup.

    • 'last' or 'auto': (For 2D output) Creates a context vector by taking the last hidden state from each LSTM scale and concatenating them.

    • 'average' or 'sum': Create a 2D context vector by averaging or summing over the time dimension for each scale.

  • final_agg ({'last', 'average', 'flatten'}, default 'last') – The aggregation strategy used to collapse the final temporal feature map (which has a time dimension equal to forecast_horizon) into a single feature vector before the final decoding step.

  • activation (str, default 'relu') – The name of the activation function to use in Dense layers and Gated Residual Networks (GRNs) throughout the model. Common choices include ‘relu’, ‘gelu’, ‘swish’, and ‘tanh’.

  • use_residuals (bool, default True) – If True, enables residual “add & norm” connections after key sub-layers (like attention and GRNs). These shortcut connections are crucial for training very deep networks as they help prevent vanishing gradients and ease the optimization process.

  • use_vsn (bool, default True) – If True, the model uses VariableSelectionNetwork (VSN) layers at the input stage. VSNs perform intelligent, learnable feature selection, allowing the model to up-weight or down-weight the importance of each input variable. This can improve performance and provide insights into which features are most impactful. If False, simpler Dense layers are used for initial projection.

  • vsn_units (int, optional) – The number of units in the internal Gated Residual Networks (GRNs) of the Variable Selection Networks. This parameter controls the capacity of the feature selection sub-networks. If None, it often defaults to a value based on hidden_units.

  • use_batch_norm (bool, default False) – If True, applies batch normalization.

  • apply_dtw (bool, default True) – Whether to apply Dynamic Time Warping (DTW) for time-series alignment. DTW is a technique used to align sequences that may be misaligned in time. It is particularly useful when the time steps in the dynamic and future features are not synchronized. Setting this to True enables DTW, while setting it to False disables it. If True, applies a DynamicTimeWindow layer to the encoder output, allowing the model to learn an optimal, data-dependent lookback window.

  • attention_levels (str or list[str], optional) – Legacy parameter. Controls the attention layers used in the decoder. It is recommended to use architecture_config={‘decoder_attention_stack’: […]} instead.

  • objective ({'hybrid', 'transformer'}, default :class:``’hybrid’:class:``) –

    Legacy parameter. Defines the underlying architecture of the model. The configuration can be either ‘hybrid’ (combining LSTM and attention mechanisms) or ‘transformer’ (using only transformer-based attention mechanisms).It is recommended to use architecture_config={‘encoder_type’: ‘hybrid’} instead.

    Selects the backbone architecture that processes dynamic-past and (optionally) known-future covariates before the decoding stage.

    • 'hybrid'Multi-scale LSTM -> Transformer. The encoder first extracts multi-resolution temporal features with a stack of LSTMs (one per scale), then refines these features with hierarchical/cross attention blocks. This configuration balances the strong sequence-memory capability of recurrent networks with the global-context modelling power of Transformers and is recommended for most tabular time-series data.

    • 'transformer'Pure Transformer. Bypasses the LSTM stack and feeds the embeddings directly into the attention encoder, resulting in a lightweight, fully self-attention model. Choose this if your data exhibit long-range dependencies for which an LSTM adds little benefit, or when you need faster training/inference at the cost of some short-term pattern capture.

    In future release:

    Shortcut for common loss presets. Should be recognised: * 'nse' – Nash–Sutcliffe model-efficiency score. * 'rmse' – root-mean-square error. When None we will supply losses via compile()

  • architecture_config (dict, optional) – A dictionary for fine-grained control over the model’s internal architecture. This is the recommended way to configure the model. See the Notes section for details on keys like encoder_type, decoder_attention_stack, and feature_processing.

  • name (str, default "BaseAttentiveModel") – Model identifier passed to :pyclass:`tf.keras.Model`. Appears in weight filenames and TensorBoard scopes.

  • **kwargs – Additional keyword arguments forwarded verbatim to the :pyclass:`tf.keras.Model` constructor—e.g. dtype="float64" or run_eagerly=True.

Notes

  • The composite latent size produced by the cross‑attention block is \(d_\text{model} = \text{attention\_units}\). For stable training, ensure \(d_\text{model}\) is divisible by num_heads.

  • The model configuration supports both hybrid and transformer-based designs. The hybrid configuration combines LSTM with attention mechanisms, while the transformer configuration exclusively uses self-attention mechanisms.

  • The attention mechanism allows for both cross-attention (between encoder and decoder) and self-attention within the decoder.

Smart Configuration

The recommended way to define the model’s structure is via the architecture_config dictionary. It provides clear, explicit control over the most important architectural choices:

  • `encoder_type`: Defines the encoder’s core mechanism.
    • 'hybrid' (default): Uses the MultiScaleLSTM for rich temporal feature extraction.

    • 'transformer': Uses a pure self-attention stack, ideal for capturing very long-range dependencies.

  • `decoder_attention_stack`: A list of strings that defines

    the sequence of attention layers in the decoder. The available layers are: * 'cross': The crucial cross-attention between decoder

    queries and encoder memory.

    • 'hierarchical': A self-attention layer that helps find structural patterns in the context.

    • 'memory': A memory-augmented self-attention layer for long-term dependencies.

    • Example: ['cross', 'hierarchical'] creates a simpler decoder.

  • `feature_processing`: Controls the initial feature embedding.
    • 'vsn' (default): Uses VariableSelectionNetwork for learnable feature selection.

    • 'dense': Uses standard Dense layers.

The legacy parameters (objective, use_vsn, attention_levels) are maintained for backward compatibility but will be overridden by any settings provided in architecture_config.

See also

  • fusionlab.nn.pinn.PIHALNet – physics-informed extension.

  • fusionlab.utils.data_utils.widen_temporal_columns() – prepares wide data frames for plotting forecasts.

Examples

>>> from fusionlab.nn.models._base_attentive import BaseAttentive
>>> model = BaseAttentive(
...     static_input_dim=4, dynamic_input_dim=8, future_input_dim=6,
...     output_dim=2, forecast_horizon=24, quantiles=[0.1, 0.5, 0.9],
...     scales=[1, 3], multi_scale_agg="concat", final_agg="last",
...     attention_units=64, num_heads=8, dropout_rate=0.15,
... )
>>> x_static  = tf.random.normal([32, 4])              # B × S
>>> x_dynamic = tf.random.normal([32, 10, 8])          # B × T × D
>>> x_future  = tf.random.normal([32, 24, 6])          # B × H × F
>>> y_hat = model( [x_static, x_dynamic,  x_future, ]
... )
>>> y_hat.shape
TensorShape([32, 24, 3, 2])  # B × H × Q × output_dim
>>> from fusionlab.nn.models import BaseAttentive
>>> import tensorflow as tf
>>> # Example using the recommended architecture_config
>>> transformer_config = {
...     'encoder_type': 'transformer',
...     'decoder_attention_stack': ['cross', 'hierarchical'],
...     'feature_processing': 'dense'
... }
>>> model = BaseAttentive(
...     static_input_dim=4,
...     dynamic_input_dim=8,
...     future_input_dim=6,
...     output_dim=2,
...     forecast_horizon=24,
...     max_window_size=10,
...     mode='tft_like',
...     quantiles=[0.1, 0.5, 0.9],
...     architecture_config=transformer_config
... )
>>> # Prepare dummy input data
>>> BATCH_SIZE = 32
>>> x_static  = tf.random.normal([BATCH_SIZE, 4])
>>> x_dynamic = tf.random.normal([BATCH_SIZE, 10, 8])
>>> x_future  = tf.random.normal([BATCH_SIZE, 10 + 24, 6])
>>> # Get model output
>>> y_hat = model([x_static, x_dynamic, x_future])
>>> y_hat.shape
TensorShape([32, 24, 3, 2])

See also

fusionlab.nn.pinn.PIHALNet

A physics-informed extension of this architecture.

fusionlab.nn.components.MultiScaleLSTM

The multi-resolution LSTM component used in the hybrid encoder.

fusionlab.nn.components.VariableSelectionNetwork

The learnable feature-selection component.

fusionlab.nn.models.HALNet

A direct, data-driven implementation of BaseAttentive.

References

__init__(static_input_dim, dynamic_input_dim, future_input_dim, output_dim=1, forecast_horizon=1, mode=None, num_encoder_layers=2, quantiles=None, embed_dim=32, hidden_units=64, lstm_units=64, attention_units=32, num_heads=4, dropout_rate=0.1, max_window_size=10, memory_size=100, scales=None, multi_scale_agg='last', final_agg='last', activation='relu', use_residuals=True, use_vsn=True, vsn_units=None, use_batch_norm=False, apply_dtw=True, attention_levels=None, objective='hybrid', architecture_config=None, verbose=0, name='BaseAttentiveModel', **kwargs)[source]
Parameters:
  • static_input_dim (int)

  • dynamic_input_dim (int)

  • future_input_dim (int)

  • output_dim (int)

  • forecast_horizon (int)

  • mode (str | None)

  • num_encoder_layers (int)

  • quantiles (List[float] | None)

  • embed_dim (int)

  • hidden_units (int)

  • lstm_units (int)

  • attention_units (int)

  • num_heads (int)

  • dropout_rate (float)

  • max_window_size (int)

  • memory_size (int)

  • scales (List[int] | None)

  • multi_scale_agg (str)

  • final_agg (str)

  • activation (str)

  • use_residuals (bool)

  • use_vsn (bool)

  • vsn_units (int | None)

  • use_batch_norm (bool)

  • apply_dtw (bool)

  • attention_levels (str | List[str] | None)

  • objective (str)

  • architecture_config (Dict | None)

  • verbose (int)

  • name (str)

Methods

__init__(static_input_dim, ...[, ...])

add_loss(loss)

Can be called inside of the call() method to add a scalar loss.

add_metric(*args, **kwargs)

add_variable(shape, initializer[, dtype, ...])

Add a weight variable to the layer.

add_weight([shape, initializer, dtype, ...])

Add a weight variable to the layer.

apply_attention_levels(...)

Applies attention mechanisms in the order specified by att_levels, using the provided attention methods such as cross attention, hierarchical attention, and memory-augmented attention.

build(input_shape)

build_from_config(config)

Builds the layer's states with the supplied config dict.

call(inputs[, training])

Forward pass for the attentive model.

compile([optimizer, loss, loss_weights, ...])

Configures the model for training.

compile_from_config(config)

Compiles the model with the information given in config.

compiled_loss(y, y_pred[, sample_weight, ...])

compute_loss([x, y, y_pred, sample_weight, ...])

Compute the total loss, validate it, and return it.

compute_mask(inputs, previous_mask)

compute_metrics(x, y, y_pred[, sample_weight])

Update metric states and collect all metrics to be returned.

compute_output_shape(*args, **kwargs)

compute_output_spec(*args, **kwargs)

count_params()

Count the total number of scalars composing the weights.

evaluate([x, y, batch_size, verbose, ...])

Returns the loss value & metrics values for the model in test mode.

export(filepath[, format, verbose, ...])

Export the model as an artifact for inference.

fit([x, y, batch_size, epochs, verbose, ...])

Trains the model for a fixed number of epochs (dataset iterations).

from_config(config)

Creates a model from its config.

get_build_config()

Returns a dictionary with the layer's input shape.

get_compile_config()

Returns a serialized config with information for compiling the model.

get_config()

Returns the configuration of the model as a dictionary.

get_layer([name, index])

Retrieves a layer based on either its name (unique) or index.

get_metrics_result()

Returns the model's metrics values as a dict.

get_params([deep])

Get the parameters for this learner.

get_state_tree([value_format])

Retrieves tree-like structure of model variables.

get_weights()

Return the values of layer.weights as a list of NumPy arrays.

help(**kwargs)

load(file_path[, format])

Load the learner's state from a specified file in the desired format.

load_own_variables(store)

Loads the state of the layer.

load_weights(filepath[, skip_mismatch])

Load the weights from a single file or sharded files.

loss(y, y_pred[, sample_weight])

make_predict_function([force])

make_test_function([force])

make_train_function([force])

predict(x[, batch_size, verbose, steps, ...])

Generates output predictions for the input samples.

predict_on_batch(x)

Returns predictions for a single batch of samples.

predict_step(data)

quantize(mode[, config])

Quantize the weights of the model.

quantized_build(input_shape, mode)

quantized_call(*args, **kwargs)

reconfigure(architecture_config)

Creates a new model instance with a modified architecture.

rematerialized_call(layer_call, *args, **kwargs)

Enable rematerialization dynamically for layer's call method.

reset_metrics()

run_encoder_decoder_core(static_input, ...)

Executes the data-driven pipeline with a selectable encoder architecture, processing static, dynamic, and future inputs through the encoder-decoder interaction.

save(filepath[, overwrite, zipped])

Saves a model as a .keras file.

save_own_variables(store)

Saves the state of the layer.

save_weights(filepath[, overwrite, ...])

Saves all weights to a single file or sharded files.

set_params(**params)

Set the parameters of this learner.

set_state_tree(state_tree)

Assigns values to variables of the model.

set_weights(weights)

Sets the values of layer.weights from a list of NumPy arrays.

stateless_call(trainable_variables, ...[, ...])

Call the layer without any side effects.

stateless_compute_loss(trainable_variables, ...)

summary([line_length, positions, print_fn, ...])

Prints a string summary of the network.

symbolic_call(*args, **kwargs)

test_on_batch(x[, y, sample_weight, return_dict])

Test the model on a single batch of samples.

test_step(data)

to_json(**kwargs)

Returns a JSON string containing the network configuration.

train_on_batch(x[, y, sample_weight, ...])

Runs a single gradient update on a single batch of data.

train_step(data)

Attributes

compiled_metrics

compute_dtype

The dtype of the computations performed by the layer.

distribute_reduction_method

distribute_strategy

dtype

Alias of layer.variable_dtype.

dtype_policy

input

Retrieves the input tensor(s) of a symbolic operation.

input_dtype

The dtype layer inputs should be converted to.

input_spec

jit_compile

layers

losses

List of scalar losses from add_loss, regularizers and sublayers.

metrics

List of all metrics.

metrics_names

metrics_variables

List of all metric variables.

my_params

non_trainable_variables

List of all non-trainable layer state.

non_trainable_weights

List of all non-trainable weight variables of the layer.

output

Retrieves the output tensor(s) of a layer.

path

The path of the layer.

quantization_mode

The quantization mode of this layer, None if not quantized.

run_eagerly

supports_masking

Whether this layer supports computing a mask using compute_mask.

trainable

Settable boolean, whether this layer should be trainable or not.

trainable_variables

List of all trainable layer state.

trainable_weights

List of all trainable weight variables of the layer.

variable_dtype

The dtype of the state (weights) of the layer.

variables

List of all layer state, including random seeds.

weights

List of all weight variables of the layer.

__init__(static_input_dim, dynamic_input_dim, future_input_dim, output_dim=1, forecast_horizon=1, mode=None, num_encoder_layers=2, quantiles=None, embed_dim=32, hidden_units=64, lstm_units=64, attention_units=32, num_heads=4, dropout_rate=0.1, max_window_size=10, memory_size=100, scales=None, multi_scale_agg='last', final_agg='last', activation='relu', use_residuals=True, use_vsn=True, vsn_units=None, use_batch_norm=False, apply_dtw=True, attention_levels=None, objective='hybrid', architecture_config=None, verbose=0, name='BaseAttentiveModel', **kwargs)[source]
Parameters:
  • static_input_dim (int)

  • dynamic_input_dim (int)

  • future_input_dim (int)

  • output_dim (int)

  • forecast_horizon (int)

  • mode (str | None)

  • num_encoder_layers (int)

  • quantiles (List[float] | None)

  • embed_dim (int)

  • hidden_units (int)

  • lstm_units (int)

  • attention_units (int)

  • num_heads (int)

  • dropout_rate (float)

  • max_window_size (int)

  • memory_size (int)

  • scales (List[int] | None)

  • multi_scale_agg (str)

  • final_agg (str)

  • activation (str)

  • use_residuals (bool)

  • use_vsn (bool)

  • vsn_units (int | None)

  • use_batch_norm (bool)

  • apply_dtw (bool)

  • attention_levels (str | List[str] | None)

  • objective (str)

  • architecture_config (Dict | None)

  • verbose (int)

  • name (str)

run_encoder_decoder_core(static_input, dynamic_input, future_input, training)[source]

Executes the data-driven pipeline with a selectable encoder architecture, processing static, dynamic, and future inputs through the encoder-decoder interaction. Attention mechanisms are applied in the decoder block, with flexibility to select which types of attention to use via the att_levels parameter.

Parameters:
  • static_input (Tensor) – The input tensor containing static features, which remain constant over time (e.g., environmental data, geographical features).

  • dynamic_input (Tensor) – The input tensor containing dynamic features, which vary over time (e.g., sensor readings, time-series data).

  • future_input (Tensor) – The input tensor representing future features, typically used for forecasting or projection purposes.

  • training (bool) – A flag indicating whether the model is in training mode. This flag controls the use of training-specific operations, such as dropout and batch normalization.

Returns:

The final output tensor, which has undergone attention fusion and time-based aggregation. This tensor is used for further tasks such as classification, regression, or forecasting.

Return type:

Tensor

Notes

  • The method processes static, dynamic, and future inputs through separate paths before combining them for the encoder.

  • Attention mechanisms are applied in the decoder block. The specific attention types and their order are controlled via the att_levels parameter, which can include:

    • ‘cross’ for cross attention.

    • ‘hierarchical’ for hierarchical attention.

    • ‘memory’ for memory-augmented attention.

  • If multiple attention mechanisms are chosen, they are applied sequentially.

  • The time dimension is collapsed in the final output, resulting in a single vector per sample.

References

  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A., Kaiser, Ł., Polosukhin, I. (2017). Attention is all you need. NeurIPS 2017, 30, 6000-6010.

  • Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural Machine Translation by Jointly Learning to Align and Translate. ICLR 2015.

apply_attention_levels(projected_decoder_input, encoder_sequences, training)[source]

Applies attention mechanisms in the order specified by att_levels, using the provided attention methods such as cross attention, hierarchical attention, and memory-augmented attention.

Parameters:
  • projected_decoder_input (Tensor) – The input tensor to be used in the attention mechanisms.

  • encoder_sequences (Tensor) – The encoder output sequences used in attention.

  • training (bool) – A flag indicating whether the model is in training mode.

  • att_levels (str, list of str, int, or None) –

    Specifies the attention mechanisms to apply and the order: - If None or ‘use_all’ or ‘*’, use all attention mechanisms. - If ‘hier_att’ or ‘hierarchical_attention’, apply

    hierarchical attention.

    • If ‘memo_aug_att’ or ‘memory_augmented_attention’, apply memory-augmented attention.

    • If a list of strings, apply attention types in the provided order.

    • If an integer (1, 2, 3), map it to cross attention (1), hierarchical attention (2), or memory-augmented attention (3).

Returns:

The final output tensor after applying attention mechanisms in order.

Return type:

Tensor

Notes

The order of attention mechanisms is determined by the provided att_levels list.

call(inputs, training=False)[source]

Forward pass for the attentive model.

This method processes the input data, validates the dimensions, and then performs the forward pass through the encoder-decoder network. The model applies attention mechanisms in the decoder phase and performs quantile distribution modeling if enabled.

Parameters:
  • inputs (Tensor) – A tensor containing the input data. It includes the static, dynamic, and future covariate features required for the model.

  • training (bool, optional, default False) – A flag indicating whether the model is in training mode. This flag controls operations such as dropout and batch normalization.

Returns:

The final output tensor after passing through the model, which may include quantile distribution modeling depending on the configuration of the model.

Return type:

Tensor

Notes

  • The method first validates the input dimensions for static, dynamic, and future features using validate_model_inputs.

  • The model then asserts that the future input tensor has the correct time span using tf_assert_equal.

  • The forward pass is completed by invoking the encoder-decoder core method (run_encoder_decoder_core), followed by the multi-decoder and quantile distribution modeling (if enabled).

References

  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A., Kaiser, Ł., Polosukhin, I. (2017). Attention is all you need. NeurIPS 2017, 30, 6000-6010.

  • Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural Machine Translation by Jointly Learning to Align and Translate. ICLR 2015.

get_config()[source]

Returns the configuration of the model as a dictionary.

This method retrieves the configuration of the model, including all the hyperparameters and settings that define the model’s behavior. The returned dictionary can be used for saving, reproducing, or inspecting the model’s configuration.

The method overrides the default get_config method from the parent class and includes specific attributes of the BaseAttentive model, such as the input dimensions, architecture type, attention mechanisms, and regularization settings. The configuration can be serialized and used to recreate the model with the same parameters.

classmethod from_config(config)[source]

Creates a model from its config.

This method is the reverse of get_config, capable of handling the nested architecture_config dictionary.

help(**kwargs)
my_params = BaseAttentive(     static_input_dim,     dynamic_input_dim,     future_input_dim,     output_dim=1,     forecast_horizon=1,     mode=None,     num_encoder_layers=2,     quantiles=None,     embed_dim=32,     hidden_units=64,     lstm_units=64,     attention_units=32,     num_heads=4,     dropout_rate=0.1,     max_window_size=10,     memory_size=100,     scales=None,     multi_scale_agg='last',     final_agg='last',     activation='relu',     use_residuals=True,     use_vsn=True,     vsn_units=None,     use_batch_norm=False,     apply_dtw=True,     attention_levels=None,     objective='hybrid',     architecture_config=None,     verbose=0,     name='BaseAttentiveModel' )
reconfigure(architecture_config)[source]

Creates a new model instance with a modified architecture.

This method takes the configuration of the current model, updates the architectural components with the provided dictionary, and returns a new, un-trained model instance with the specified changes.

Parameters:
  • (Dict[str (architecture_config) – A dictionary with new architectural settings, such as {‘encoder_type’: ‘transformer’}.

  • Any]) – A dictionary with new architectural settings, such as {‘encoder_type’: ‘transformer’}.

  • architecture_config (Dict[str, Any])

Returns:

A new model instance with the updated architecture.

Return type:

BaseAttentive