fusionlab.nn.transformers.DummyTFT¶

class fusionlab.nn.transformers.DummyTFT[source]¶

Bases: Model, NNLearner

DummyTFT: Simplified TFT variant using only Static and Dynamic inputs.

Notes

The Temporal Fusion Transformer (TFT) model combines the strengths of sequence-to-sequence models and attention mechanisms to handle complex temporal dynamics. It provides interpretability by allowing examination of variable importance and temporal attention weights.

Variable Selection Networks (VSNs):

VSNs select relevant variables by applying Gated Residual Networks (GRNs) to each variable and computing variable importance weights via a softmax function. This allows the model to focus on the most informative features.

Gated Residual Networks (GRNs):

GRNs allow the model to capture complex nonlinear relationships while controlling information flow via gating mechanisms. They consist of a nonlinear layer followed by gating and residual connections.

Static Enrichment Layer:

Enriches temporal features with static context, enabling the model to adjust temporal dynamics based on static information. This layer combines static embeddings with temporal representations.

Temporal Attention Layer:

Applies multi-head attention over the temporal dimension to focus on important time steps. This mechanism allows the model to weigh different time steps differently when making predictions.

Mathematical Formulation:

Let:

\(\mathbf{x}_{ ext{static}} \in \mathbb{R}^{n_s imes d_s}\) be the static inputs,
\(\mathbf{x}_{ ext{dynamic}} \in \mathbb{R}^{T imes n_d imes d_d}\) be the dynamic inputs,
\(n_s\) and \(n_d\) are the numbers of static and dynamic variables,
\(d_s\) and \(d_d\) are their respective input dimensions,
\(T\) is the number of time steps.

Variable Selection Networks (VSNs):

For static variables:

\[\mathbf{e}_{ ext{static}} = \sum_{i=1}^{n_s} lpha_i \cdot ext{GRN}(\mathbf{x}_{ ext{static}, i})\]

For dynamic variables:

\[\mathbf{E}_{ ext{dynamic}} = \sum_{j=1}^{n_d} eta_j \cdot ext{GRN}(\mathbf{x}_{ ext{dynamic}, :, j})\]

where \(lpha_i\) and \(eta_j\) are variable importance weights computed via softmax.

LSTM Encoder:

Processes dynamic embeddings to capture sequential dependencies:

\[\mathbf{H} = ext{LSTM}(\mathbf{E}_{ ext{dynamic}})\]

Static Enrichment Layer:

Combines static context with temporal features:

\[\mathbf{H}_{ ext{enriched}} = ext{StaticEnrichment}( \mathbf{e}_{ ext{static}}, \mathbf{H})\]

Temporal Attention Layer:

Applies attention over time steps:

\[\mathbf{Z} = ext{TemporalAttention}(\mathbf{H}_{ ext{enriched}})\]

Position-wise Feedforward Layer:

Refines the output:

\[\mathbf{F} = ext{GRN}(\mathbf{Z})\]

Final Output:

For point forecasting:

\[\hat{y} = ext{OutputLayer}(\mathbf{F}_{T})\]

For quantile forecasting (if quantiles are specified):

\[\hat{y}_q = ext{OutputLayer}_q(\mathbf{F}_{T}), \quad q \in ext{quantiles}\]

where \(\mathbf{F}_{T}\) is the feature vector at the last time step.

Examples

>>> from fusionlab.nn.transformers import TemporalFusionTransformer
>>> # Define model parameters
>>> model = TemporalFusionTransformer(
...     static_input_dim=1,
...     dynamic_input_dim=1,
...     hidden_units=64,
...     num_heads=4,
...     dropout_rate=0.1,
...     forecast_horizon=1,
...     quantiles=[0.1, 0.5, 0.9],
...     activation='relu',
...     use_batch_norm=True,
...     num_lstm_layers=2,
...     lstm_units=[64, 32]
... )
>>> model.compile(optimizer='adam', loss='mse')
>>> # Assume `static_inputs`, `dynamic_inputs`, and `labels` are prepared
>>> model.fit(
...     [static_inputs, dynamic_inputs],
...     labels,
...     epochs=10,
...     batch_size=32
... )

Notes

When using quantile regression by specifying the quantiles parameter, ensure that your loss function is compatible with quantile prediction, such as the quantile loss function. Additionally, the model output will have multiple predictions per time step, corresponding to each quantile.

See also

VariableSelectionNetwork: Selects relevant variables.
GatedResidualNetwork: Processes inputs with gating mechanisms.
StaticEnrichmentLayer: Enriches temporal features with static context.
TemporalAttentionLayer: Applies attention over time steps.

References

The DummyTFT combines high-performance multi-horizon forecasting with interpretable insights into temporal dynamics [1]. It integrates several advanced mechanisms, including:

Variable Selection Networks (VSNs) for static and dynamic features.
Gated Residual Networks (GRNs) for processing inputs.
Static Enrichment Layer to incorporate static features into temporal processing.
LSTM Encoder for capturing sequential dependencies.
Temporal Attention Layer for focusing on important time steps.
Position-wise Feedforward Layer.
Final Output Layer for prediction.

Parameters:

static_input_dim (int) – The input dimension per static variable. Typically 1 for scalar features or higher for embeddings. This defines the number of features for each static variable. For example, if static variables are represented using embeddings of size 16, then static_input_dim would be 16.
dynamic_input_dim (int) – The input dimension per dynamic variable. This defines the number of features for each dynamic variable at each time step. For instance, if dynamic variables are represented using embeddings or multiple features, specify the appropriate dimension.

hidden_units: int

The number of hidden units in the model’s layers. This parameter defines the size of the hidden layers throughout the model, including Gated Recurrent Networks (GRNs), Long Short-Term Memory (LSTM) layers, and fully connected layers. Increasing the value of hidden_units enhances the model’s capacity to capture more complex relationships and patterns from the data. However, it also increases computational costs due to a higher number of parameters. The choice of hidden units should balance model capacity and computational feasibility, depending on the complexity of the problem and available resources.

num_heads: int

The number of attention heads in the multi-head attention mechanism. Multiple attention heads allow the model to focus on different aspects of the input data, capturing more complex relationships within the data. More heads provide better representation power but increase computational costs. This parameter is crucial in self-attention mechanisms where each head can attend to different parts of the input data in parallel, improving the model’s ability to capture diverse features. For example, in natural language processing, multiple heads allow the model to attend to different semantic aspects of the text. Using more heads can increase the model’s capacity to learn complex features, but it also requires more memory and computational power.

dropout_rate: float, optional

The dropout rate applied during training to prevent overfitting. Dropout is a regularization technique where a fraction of input units is randomly set to zero at each training step to prevent the model from relying too heavily on any one feature. This helps improve generalization and can make the model more robust. Dropout is particularly effective in deep learning models where overfitting is a common issue. The value should be between 0.0 and 1.0, where a value of 0.0 means no dropout is applied and a value of 1.0 means that all units are dropped. A typical value for dropout_rate ranges from 0.1 to 0.5.

forecast_horizonint, optional

The number of time steps to forecast. Default is 1. This parameter defines the number of future time steps the model will predict. For multi-step forecasting, set forecast_horizon to the desired number of future steps.

quantiles: list of float or None, optional

A list of quantiles to predict for each time step. For example, specifying [0.1, 0.5, 0.9] would result in the model predicting the 10th, 50th, and 90th percentiles of the target variable at each time step. This is useful for estimating prediction intervals and capturing uncertainty in forecasting tasks. If set to None, the model performs point forecasting and predicts a single value (e.g., the mean or most likely value) for each time step. Quantile forecasting is commonly used for applications where it is important to predict not just the most likely outcome, but also the range of possible outcomes.

activation: str, optional

The activation function to use in the Gated Recurrent Networks (GRNs). The activation function defines how the model’s internal representations are transformed before being passed to the next layer. Supported values include:

'elu': Exponential Linear Unit (ELU), a variant of ReLU that improves training performance by preventing dying neurons. ELU provides a smooth output for negative values, which can help mitigate the issue of vanishing gradients. The mathematical formulation for ELU is:

\[f(x) = egin{cases} x & ext{if } x > 0 \ lpha (\exp(x) - 1) & ext{if } x \leq 0 \end{cases}\]

where (lpha) is a constant (usually 1.0).
'relu': Rectified Linear Unit (ReLU), a widely used activation function that outputs zero for negative input and the input itself for positive values. It is computationally efficient and reduces the risk of vanishing gradients. The mathematical formulation for ReLU is:

\[f(x) = \max(0, x)\]

where (x) is the input value.
'tanh': Hyperbolic Tangent, which squashes the outputs into a range between -1 and 1. It is useful when preserving the sign of the input is important, but can suffer from vanishing gradients for large inputs. The mathematical formulation for tanh is:

\[f(x) =\]

rac{2}{1 + exp(-2x)} - 1

'sigmoid': Sigmoid function, commonly used for binary classification tasks, maps outputs between 0 and 1, making it suitable for probabilistic outputs. The mathematical formulation for sigmoid is:

\[f(x) =\]

rac{1}{1 + exp(-x)}

'linear': No activation (identity function), often used in regression tasks where no non-linearity is needed. The output is simply the input value:

\[f(x) = x\]

The default activation function is 'elu'.

use_batch_norm: bool, optional: Whether to use batch normalization in the Gated Recurrent Networks (GRNs). Batch normalization normalizes the input to each layer, stabilizing and accelerating the training process. When set to True, it normalizes the activations by scaling and shifting them to maintain a stable distribution during training. This technique can help mitigate issues like vanishing and exploding gradients, making it easier to train deep networks. Batch normalization also acts as a form of regularization, reducing the need for other techniques like dropout. By default, batch normalization is turned off (False).
num_lstm_layersint, optional: Number of LSTM layers in the encoder. Default is 1. Adding more layers can help the model capture more complex sequential patterns. Each additional layer processes the output of the previous LSTM layer.
lstm_unitslist of int or None, optional: List containing the number of units for each LSTM layer. If None, all LSTM layers have hidden_units units. Default is None. This parameter allows customizing the size of each LSTM layer. For example, to specify different units for each layer, provide a list like [64, 32].

call(inputs, training=False)[source]¶

Forward pass of the model.

Parameters:

inputs (tuple of tensors) –
A tuple containing (static_inputs, dynamic_inputs).
- static_inputs: Tensor of shape (batch_size, num_static_vars, static_input_dim) representing the static features.
- dynamic_inputs: Tensor of shape (batch_size, time_steps, num_dynamic_vars, dynamic_input_dim) representing the dynamic features.
training (bool, optional) – Whether the model is in training mode. Default is False.

Returns:

The output predictions of the model. The shape depends on the forecast_horizon and whether quantiles are used.

Return type:

Tensor

get_config()[source]¶: Returns the configuration of the model for serialization.

from_config(config)[source]¶: Instantiates the model from a configuration dictionary.

__init__(dynamic_input_dim, static_input_dim, future_input_dim=None, hidden_units=32, num_heads=4, dropout_rate=0.1, forecast_horizon=1, quantiles=None, activation='elu', use_batch_norm=False, num_lstm_layers=1, lstm_units=None, output_dim=1, name=None, **kwargs)[source]¶

Parameters:

dynamic_input_dim (int)
static_input_dim (int)
future_input_dim (Any)
hidden_units (int)
num_heads (int)
dropout_rate (float)
forecast_horizon (int)
quantiles (List[float] | None)
activation (str)
use_batch_norm (bool)
num_lstm_layers (int)
lstm_units (int | List[int] | None)
output_dim (int)
name (str | None)

Methods

`__init__`(dynamic_input_dim, static_input_dim)
`add_loss`(losses, **kwargs)	Add loss tensor(s), potentially dependent on layer inputs.
`add_metric`(value[, name])	Adds metric tensor to the layer.
`add_update`(updates)	Add update op(s), potentially dependent on layer inputs.
`add_variable`(args, *kwargs)	Deprecated, do NOT use! Alias for add_weight.
`add_weight`([name, shape, dtype, ...])	Adds a new variable to the layer.
`build`(input_shape)	Builds the model based on input shapes received.
`build_from_config`(config)	Builds the layer's states with the supplied config dict.
`call`(inputs[, training])	Forward pass for DummyTFT (Static and Dynamic inputs only).
`compile`([optimizer, loss, metrics, ...])	Configures the model for training.
`compile_from_config`(config)	Compiles the model with the information given in config.
`compute_loss`([x, y, y_pred, sample_weight])	Compute the total loss, validate it, and return it.
`compute_mask`(inputs[, mask])	Computes an output mask tensor.
`compute_metrics`(x, y, y_pred, sample_weight)	Update metric states and collect all metrics to be returned.
`compute_output_shape`(input_shape)	Computes the output shape of the layer.
`compute_output_signature`(input_signature)	Compute the output tensor signature of the layer based on the inputs.
`count_params`()	Count the total number of scalars composing the weights.
`evaluate`([x, y, batch_size, verbose, ...])	Returns the loss value & metrics values for the model in test mode.
`evaluate_generator`(generator[, steps, ...])	Evaluates the model on a data generator.
`export`(filepath)	Create a SavedModel artifact for inference (e.g. via TF-Serving).
`finalize_state`()	Finalizes the layers state after updating layer weights.
`fit`([x, y, batch_size, epochs, verbose, ...])	Trains the model for a fixed number of epochs (dataset iterations).
`fit_generator`(generator[, steps_per_epoch, ...])	Fits the model on data yielded batch-by-batch by a Python generator.
`from_config`(config)	Creates a layer from its config.
`get_build_config`()	Returns a dictionary with the layer's input shape.
`get_compile_config`()	Returns a serialized config with information for compiling the model.
`get_config`()	Returns the config of the Model.
`get_input_at`(node_index)	Retrieves the input tensor(s) of a layer at a given node.
`get_input_mask_at`(node_index)	Retrieves the input mask tensor(s) of a layer at a given node.
`get_input_shape_at`(node_index)	Retrieves the input shape(s) of a layer at a given node.
`get_layer`([name, index])	Retrieves a layer based on either its name (unique) or index.
`get_metrics_result`()	Returns the model's metrics values as a dict.
`get_output_at`(node_index)	Retrieves the output tensor(s) of a layer at a given node.
`get_output_mask_at`(node_index)	Retrieves the output mask tensor(s) of a layer at a given node.
`get_output_shape_at`(node_index)	Retrieves the output shape(s) of a layer at a given node.
`get_params`([deep])	Get the parameters for this learner.
`get_weight_paths`()	Retrieve all the variables and their paths for the model.
`get_weights`()	Retrieves the weights of the model.
`help`(**kwargs)
`load`(file_path[, format])	Load the learner's state from a specified file in the desired format.
`load_own_variables`(store)	Loads the state of the layer.
`load_weights`(filepath[, skip_mismatch, ...])	Loads all layer weights from a saved files.
`make_predict_function`([force])	Creates a function that executes one step of inference.
`make_test_function`([force])	Creates a function that executes one step of evaluation.
`make_train_function`([force])	Creates a function that executes one step of training.
`predict`(x[, batch_size, verbose, steps, ...])	Generates output predictions for the input samples.
`predict_generator`(generator[, steps, ...])	Generates predictions for the input samples from a data generator.
`predict_on_batch`(x)	Returns predictions for a single batch of samples.
`predict_step`(data)	The logic for one inference step.
`reset_metrics`()	Resets the state of all the metrics in the model.
`reset_states`()
`save`(filepath[, overwrite, save_format])	Saves a model as a TensorFlow SavedModel or HDF5 file.
`save_own_variables`(store)	Saves the state of the layer.
`save_spec`([dynamic_batch])	Returns the tf.TensorSpec of call args as a tuple (args, kwargs).
`save_weights`(filepath[, overwrite, ...])	Saves all layer weights.
`set_params`(**params)	Set the parameters of this learner.
`set_weights`(weights)	Sets the weights of the layer, from NumPy arrays.
`summary`([line_length, positions, print_fn, ...])	Prints a string summary of the network.
`test_on_batch`(x[, y, sample_weight, ...])	Test the model on a single batch of samples.
`test_step`(data)	The logic for one evaluation step.
`to_json`(**kwargs)	Returns a JSON string containing the network configuration.
`to_yaml`(**kwargs)	Returns a yaml string containing the network configuration.
`train_on_batch`(x[, y, sample_weight, ...])	Runs a single gradient update on a single batch of data.
`train_step`(data)	The logic for one training step.
`with_name_scope`(method)	Decorator to automatically enter the module name scope.

Attributes

`activity_regularizer`	Optional regularizer function for the output of this layer.
`autotune_steps_per_execution`	Settable property to enable tuning for steps_per_execution
`compute_dtype`	The dtype of the layer's computations.
`distribute_reduction_method`	The method employed to reduce per-replica values during training.
`distribute_strategy`	The tf.distribute.Strategy this model was created under.
`dtype`	The dtype of the layer weights.
`dtype_policy`	The dtype policy associated with this layer.
`dynamic`	Whether the layer is dynamic (eager-only); set in the constructor.
`inbound_nodes`	Return Functional API nodes upstream of this layer.
`input`	Retrieves the input tensor(s) of a layer.
`input_mask`	Retrieves the input mask tensor(s) of a layer.
`input_shape`	Retrieves the input shape(s) of a layer.
`input_spec`	InputSpec instance(s) describing the input format for this layer.
`jit_compile`	Specify whether to compile the model with XLA.
`layers`
`losses`	List of losses added using the add_loss() API.
`metrics`	Return metrics added using compile() or add_metric().
`metrics_names`	Returns the model's display labels for all outputs.
`my_params`
`name`	Name of the layer (string), set in the constructor.
`name_scope`	Returns a tf.name_scope instance for this class.
`non_trainable_variables`	Sequence of non-trainable variables owned by this module and its submodules.
`non_trainable_weights`	List of all non-trainable weights tracked by this layer.
`outbound_nodes`	Return Functional API nodes downstream of this layer.
`output`	Retrieves the output tensor(s) of a layer.
`output_mask`	Retrieves the output mask tensor(s) of a layer.
`output_shape`	Retrieves the output shape(s) of a layer.
`run_eagerly`	Settable attribute indicating whether the model should run eagerly.
`state_updates`	Deprecated, do NOT use!
`stateful`
`steps_per_execution`	Settable `steps_per_execution variable. Requires a compiled model.
`submodules`	Sequence of all sub-modules.
`supports_masking`	Whether this layer supports computing a mask using compute_mask.
`trainable`
`trainable_variables`	Sequence of trainable variables owned by this module and its submodules.
`trainable_weights`	List of all trainable weights tracked by this layer.
`updates`
`variable_dtype`	Alias of Layer.dtype, the dtype of the weights.
`variables`	Returns the list of all layer variables/weights.
`weights`	Returns the list of all layer variables/weights.

Parameters:

dynamic_input_dim (int)
static_input_dim (int)
future_input_dim (Any)
hidden_units (int)
num_heads (int)
dropout_rate (float)
forecast_horizon (int)
quantiles (List[float] | None)
activation (str)
use_batch_norm (bool)
num_lstm_layers (int)
lstm_units (int | List[int] | None)
output_dim (int)
name (str | None)

call(inputs, training=False, **kwargs)[source]¶: Forward pass for DummyTFT (Static and Dynamic inputs only).

get_config()[source]¶

Returns the config of the Model.

Config is a Python dictionary (serializable) containing the configuration of an object, which in this case is a Model. This allows the Model to be be reinstantiated later (without its trained weights) from this configuration.

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Developers of subclassed Model are advised to override this method, and continue to update the dict from super(MyModel, self).get_config() to provide the proper configuration of this Model. The default config will return config dict for init parameters if they are basic types. Raises NotImplementedError when in cases where a custom get_config() implementation is required for the subclassed model.

Returns:: Python dictionary containing the configuration of this Model.

classmethod from_config(config)[source]¶

Creates a layer from its config.

This method is the reverse of get_config, capable of instantiating the same layer from the config dictionary. It does not handle layer connectivity (handled by Network), nor weights (handled by set_weights).

Parameters:: config – A Python dictionary, typically the output of get_config.
Returns:: A layer instance.

help(**kwargs)¶

my_params = DummyTFT( dynamic_input_dim, static_input_dim, future_input_dim=None, hidden_units=32, num_heads=4, dropout_rate=0.1, forecast_horizon=1, quantiles=None, activation='elu', use_batch_norm=False, num_lstm_layers=1, lstm_units=None, output_dim=1, name=None )¶