fusionlab.nn.transformers.TimeSeriesTransformer¶
- class fusionlab.nn.transformers.TimeSeriesTransformer[source]¶
Bases:
Model,NNLearnerA standard Transformer model for multi-horizon time series forecasting.
This class implements the classic encoder-decoder Transformer architecture, as introduced by Vaswani et al., but specifically tailored for multi-variate, multi-horizon time series forecasting. It leverages self-attention and cross-attention mechanisms to capture complex long-range dependencies in sequential data.
The model is “pure” in the sense that it does not use any recurrent (LSTM/GRU) or convolutional layers, relying solely on attention to process temporal information. It is designed to handle three distinct types of input features: static, dynamic past-observed, and known future covariates.
- Parameters:
static_input_dim (
int) – The number of features in the static input tensor. These are time-invariant features like sensor ID or location. Can be 0 if no static features are used.dynamic_input_dim (
int) – The number of features in the dynamic input tensor, which contains past-observed, time-varying data.future_input_dim (
int) – The number of features in the future input tensor, containing covariates with known values in the forecast horizon, such as day of the week or scheduled events.embed_dim (
int, default64) – The core dimensionality of the model, \(d_{model}\). This is the size of all embedding vectors and the internal dimension of the attention layers.num_heads (
int, default4) – The number of attention heads in each multi-head attention layer. embed_dim must be divisible by num_heads.ffn_dim (
int, default128) – The dimensionality of the inner layer of the feed-forward network (FFN) that follows the attention mechanism in each encoder and decoder block.num_encoder_layers (
int, default3) – The number of identical encoder layers to stack.num_decoder_layers (
int, default3) – The number of identical decoder layers to stack.forecast_horizon (
int, default1) – The number of future time steps to predict (\(H\)). This defines the length of the output sequence.output_dim (
int, default1) – The number of target variables to forecast at each time step.dropout_rate (
float, default0.1) – The dropout rate applied within the attention mechanisms and feed-forward networks for regularization.input_dropout_rate (
float, default0.1) – The dropout rate applied to the sum of the input embeddings and positional encodings.max_seq_len_encoder (
int, default100) – The maximum expected sequence length for the encoder’s input. Used to pre-compute positional encodings.max_seq_len_decoder (
int, default50) – The maximum expected sequence length for the decoder’s input (typically forecast_horizon). Used for positional encodings.quantiles (
listoffloat, optional) – A list of quantiles (e.g.,[0.1, 0.5, 0.9]) for probabilistic forecasting. IfNone, the model produces deterministic point forecasts.use_grn_for_static (
bool, defaultFalse) – IfTrue, processes the static features through aGatedResidualNetwork(GRN). IfFalse, uses a standardDenselayer.static_integration_mode (
{{'add_to_encoder_input', 'add_to_decoder_input', 'none'}}, default'add_to_decoder_input') – Defines how the processed static context vector is integrated into the model: *'add_to_encoder_input': Adds it to the encoder’s input embeddings. *'add_to_decoder_input': Adds it to the decoder’s input embeddings. *'none': The static context is not explicitly injected.activation (
strorcallable, default'relu') – The activation function for the feed-forward networks.layer_norm_epsilon (
float, default1e-6) – The epsilon value for the Layer Normalization layers to prevent division by zero.name (
str, optional) – The name of the Keras model.**kwargs – Additional keyword arguments passed to the
tf.keras.Modelconstructor.
Notes
This model adheres to the standard Transformer architecture, which consists of an encoder-decoder stack.
- Encoder
The encoder is composed of a stack of
num_encoder_layers. Each layer contains two sub-layers: a multi-head self-attention mechanism and a position-wise feed-forward network. It processes the entire sequence of past dynamic features, allowing each position to attend to all other positions to build a rich contextual representation.- Decoder
The decoder is similarly composed of a stack of
num_decoder_layers. Each decoder layer has three sub-layers: 1. Masked Multi-Head Self-Attention: This is the key toautoregressive generation. It applies a causal mask to the decoder’s inputs to ensure that the prediction for a time step \(i\) can only depend on known outputs at steps less than \(i\), preventing the model from looking ahead.
Multi-Head Cross-Attention: This layer allows the decoder to attend to the output of the encoder. It acts as the bridge between the processed past information and the future forecast, allowing the decoder to focus on the most relevant parts of the historical context.
Feed-Forward Network: The same type of FFN as in the encoder.
Residual connections and layer normalization are applied around each sub-layer to ensure stable training.
See also
fusionlab.nn.components.TransformerEncoderLayerThe core encoder block.
fusionlab.nn.components.TransformerDecoderLayerThe core decoder block.
fusionlab.nn.models.BaseAttentiveA more complex hybrid model foundation.
References
Examples
>>> import tensorflow as tf >>> from fusionlab.nn.transformers import TimeSeriesTransformer
>>> # 1. Model Configuration >>> BATCH_SIZE = 32 >>> PAST_STEPS = 24 >>> HORIZON = 12 >>> STATIC_DIM, DYNAMIC_DIM, FUTURE_DIM = 5, 6, 4
>>> model = TimeSeriesTransformer( ... static_input_dim=STATIC_DIM, ... dynamic_input_dim=DYNAMIC_DIM, ... future_input_dim=FUTURE_DIM, ... embed_dim=32, ... num_heads=4, ... ffn_dim=64, ... num_encoder_layers=2, ... num_decoder_layers=2, ... forecast_horizon=HORIZON, ... output_dim=1, ... quantiles=[0.1, 0.5, 0.9] ... )
>>> # 2. Prepare Dummy Input Data >>> static_input = tf.random.normal([BATCH_SIZE, STATIC_DIM]) >>> dynamic_input = tf.random.normal([BATCH_SIZE, PAST_STEPS, DYNAMIC_DIM]) >>> future_input = tf.random.normal([BATCH_SIZE, HORIZON, FUTURE_DIM])
>>> # 3. Get Model Output >>> # Inputs are passed as a list: [static, dynamic, future] >>> predictions = model([static_input, dynamic_input, future_input])
>>> # 4. Check Output Shape >>> # Shape is (Batch, Horizon, Quantiles) since output_dim=1 >>> print(f"Output prediction shape: {predictions.shape}") Output prediction shape: (32, 12, 3)
- __init__(static_input_dim, dynamic_input_dim, future_input_dim, embed_dim=64, num_heads=4, ffn_dim=128, num_encoder_layers=3, num_decoder_layers=3, forecast_horizon=1, output_dim=1, dropout_rate=0.1, input_dropout_rate=0.1, max_seq_len_encoder=100, max_seq_len_decoder=50, quantiles=None, use_grn_for_static=False, static_integration_mode='add_to_decoder_input', activation='relu', layer_norm_epsilon=1e-06, name='TimeSeriesTransformer', **kwargs)[source]¶
- Parameters:
static_input_dim (int)
dynamic_input_dim (int)
future_input_dim (int)
embed_dim (int)
num_heads (int)
ffn_dim (int)
num_encoder_layers (int)
num_decoder_layers (int)
forecast_horizon (int)
output_dim (int)
dropout_rate (float)
input_dropout_rate (float)
max_seq_len_encoder (int)
max_seq_len_decoder (int)
quantiles (List[float] | None)
use_grn_for_static (bool)
static_integration_mode (str)
activation (str)
layer_norm_epsilon (float)
name (str | None)
Methods
__init__(static_input_dim, ...[, embed_dim, ...])add_loss(losses, **kwargs)Add loss tensor(s), potentially dependent on layer inputs.
add_metric(value[, name])Adds metric tensor to the layer.
add_update(updates)Add update op(s), potentially dependent on layer inputs.
add_variable(*args, **kwargs)Deprecated, do NOT use! Alias for add_weight.
add_weight([name, shape, dtype, ...])Adds a new variable to the layer.
build(input_shape)Builds the model based on input shapes received.
build_from_config(config)Builds the layer's states with the supplied config dict.
call(inputs[, training])Forward pass for the TimeSeriesTransformer.
compile([optimizer, loss, metrics, ...])Configures the model for training.
compile_from_config(config)Compiles the model with the information given in config.
compute_loss([x, y, y_pred, sample_weight])Compute the total loss, validate it, and return it.
compute_mask(inputs[, mask])Computes an output mask tensor.
compute_metrics(x, y, y_pred, sample_weight)Update metric states and collect all metrics to be returned.
compute_output_shape(input_shape)Computes the output shape of the layer.
compute_output_signature(input_signature)Compute the output tensor signature of the layer based on the inputs.
count_params()Count the total number of scalars composing the weights.
evaluate([x, y, batch_size, verbose, ...])Returns the loss value & metrics values for the model in test mode.
evaluate_generator(generator[, steps, ...])Evaluates the model on a data generator.
export(filepath)Create a SavedModel artifact for inference (e.g. via TF-Serving).
finalize_state()Finalizes the layers state after updating layer weights.
fit([x, y, batch_size, epochs, verbose, ...])Trains the model for a fixed number of epochs (dataset iterations).
fit_generator(generator[, steps_per_epoch, ...])Fits the model on data yielded batch-by-batch by a Python generator.
from_config(config[, custom_objects])Creates a layer from its config.
get_build_config()Returns a dictionary with the layer's input shape.
get_compile_config()Returns a serialized config with information for compiling the model.
Returns the config of the Model.
get_input_at(node_index)Retrieves the input tensor(s) of a layer at a given node.
get_input_mask_at(node_index)Retrieves the input mask tensor(s) of a layer at a given node.
get_input_shape_at(node_index)Retrieves the input shape(s) of a layer at a given node.
get_layer([name, index])Retrieves a layer based on either its name (unique) or index.
get_metrics_result()Returns the model's metrics values as a dict.
get_output_at(node_index)Retrieves the output tensor(s) of a layer at a given node.
get_output_mask_at(node_index)Retrieves the output mask tensor(s) of a layer at a given node.
get_output_shape_at(node_index)Retrieves the output shape(s) of a layer at a given node.
get_params([deep])Get the parameters for this learner.
get_weight_paths()Retrieve all the variables and their paths for the model.
get_weights()Retrieves the weights of the model.
help(**kwargs)load(file_path[, format])Load the learner's state from a specified file in the desired format.
load_own_variables(store)Loads the state of the layer.
load_weights(filepath[, skip_mismatch, ...])Loads all layer weights from a saved files.
make_predict_function([force])Creates a function that executes one step of inference.
make_test_function([force])Creates a function that executes one step of evaluation.
make_train_function([force])Creates a function that executes one step of training.
predict(x[, batch_size, verbose, steps, ...])Generates output predictions for the input samples.
predict_generator(generator[, steps, ...])Generates predictions for the input samples from a data generator.
predict_on_batch(x)Returns predictions for a single batch of samples.
predict_step(data)The logic for one inference step.
reset_metrics()Resets the state of all the metrics in the model.
reset_states()save(filepath[, overwrite, save_format])Saves a model as a TensorFlow SavedModel or HDF5 file.
save_own_variables(store)Saves the state of the layer.
save_spec([dynamic_batch])Returns the tf.TensorSpec of call args as a tuple (args, kwargs).
save_weights(filepath[, overwrite, ...])Saves all layer weights.
set_params(**params)Set the parameters of this learner.
set_weights(weights)Sets the weights of the layer, from NumPy arrays.
summary([line_length, positions, print_fn, ...])Prints a string summary of the network.
test_on_batch(x[, y, sample_weight, ...])Test the model on a single batch of samples.
test_step(data)The logic for one evaluation step.
to_json(**kwargs)Returns a JSON string containing the network configuration.
to_yaml(**kwargs)Returns a yaml string containing the network configuration.
train_on_batch(x[, y, sample_weight, ...])Runs a single gradient update on a single batch of data.
train_step(data)The logic for one training step.
with_name_scope(method)Decorator to automatically enter the module name scope.
Attributes
activity_regularizerOptional regularizer function for the output of this layer.
autotune_steps_per_executionSettable property to enable tuning for steps_per_execution
compute_dtypeThe dtype of the layer's computations.
distribute_reduction_methodThe method employed to reduce per-replica values during training.
distribute_strategyThe tf.distribute.Strategy this model was created under.
dtypeThe dtype of the layer weights.
dtype_policyThe dtype policy associated with this layer.
dynamicWhether the layer is dynamic (eager-only); set in the constructor.
inbound_nodesReturn Functional API nodes upstream of this layer.
inputRetrieves the input tensor(s) of a layer.
input_maskRetrieves the input mask tensor(s) of a layer.
input_shapeRetrieves the input shape(s) of a layer.
input_specInputSpec instance(s) describing the input format for this layer.
jit_compileSpecify whether to compile the model with XLA.
layerslossesList of losses added using the add_loss() API.
metricsReturn metrics added using compile() or add_metric().
metrics_namesReturns the model's display labels for all outputs.
nameName of the layer (string), set in the constructor.
name_scopeReturns a tf.name_scope instance for this class.
non_trainable_variablesSequence of non-trainable variables owned by this module and its submodules.
non_trainable_weightsList of all non-trainable weights tracked by this layer.
outbound_nodesReturn Functional API nodes downstream of this layer.
outputRetrieves the output tensor(s) of a layer.
output_maskRetrieves the output mask tensor(s) of a layer.
output_shapeRetrieves the output shape(s) of a layer.
run_eagerlySettable attribute indicating whether the model should run eagerly.
state_updatesDeprecated, do NOT use!
statefulsteps_per_executionSettable `steps_per_execution variable. Requires a compiled model.
submodulesSequence of all sub-modules.
supports_maskingWhether this layer supports computing a mask using compute_mask.
trainabletrainable_variablesSequence of trainable variables owned by this module and its submodules.
trainable_weightsList of all trainable weights tracked by this layer.
updatesvariable_dtypeAlias of Layer.dtype, the dtype of the weights.
variablesReturns the list of all layer variables/weights.
weightsReturns the list of all layer variables/weights.
- __init__(static_input_dim, dynamic_input_dim, future_input_dim, embed_dim=64, num_heads=4, ffn_dim=128, num_encoder_layers=3, num_decoder_layers=3, forecast_horizon=1, output_dim=1, dropout_rate=0.1, input_dropout_rate=0.1, max_seq_len_encoder=100, max_seq_len_decoder=50, quantiles=None, use_grn_for_static=False, static_integration_mode='add_to_decoder_input', activation='relu', layer_norm_epsilon=1e-06, name='TimeSeriesTransformer', **kwargs)[source]¶
- Parameters:
static_input_dim (int)
dynamic_input_dim (int)
future_input_dim (int)
embed_dim (int)
num_heads (int)
ffn_dim (int)
num_encoder_layers (int)
num_decoder_layers (int)
forecast_horizon (int)
output_dim (int)
dropout_rate (float)
input_dropout_rate (float)
max_seq_len_encoder (int)
max_seq_len_decoder (int)
quantiles (List[float] | None)
use_grn_for_static (bool)
static_integration_mode (str)
activation (str)
layer_norm_epsilon (float)
name (str | None)
- call(inputs, training=False)[source]¶
Forward pass for the TimeSeriesTransformer.
- Parameters:
inputs (
A listortupleoftensors. The elements are:) –static_input (Batch, static_input_dim) (Can be None if self.static_input_dim is 0).
dynamic_input (Batch, T_past, dynamic_input_dim)
future_input (Batch, T_decode_seq, future_input_dim) (T_decode_seq is typically self.forecast_horizon.
Can be None if self.future_input_dim is 0).
The order must be consistent if some inputs are None. It’s safer if the model expects a dict or if caller ensures correct list even with Nones. This call method expects a list/tuple that will be passed to prepare_model_inputs.
training (
Boolean,whether the model is in training mode.)
- Return type:
A tensor with forecast predictions.
- get_config()[source]¶
Returns the config of the Model.
Config is a Python dictionary (serializable) containing the configuration of an object, which in this case is a Model. This allows the Model to be be reinstantiated later (without its trained weights) from this configuration.
Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.
Developers of subclassed Model are advised to override this method, and continue to update the dict from super(MyModel, self).get_config() to provide the proper configuration of this Model. The default config will return config dict for init parameters if they are basic types. Raises NotImplementedError when in cases where a custom get_config() implementation is required for the subclassed model.
- Returns:
Python dictionary containing the configuration of this Model.
- help(**kwargs)¶
- my_params = TimeSeriesTransformer( static_input_dim, dynamic_input_dim, future_input_dim, embed_dim=64, num_heads=4, ffn_dim=128, num_encoder_layers=3, num_decoder_layers=3, forecast_horizon=1, output_dim=1, dropout_rate=0.1, input_dropout_rate=0.1, max_seq_len_encoder=100, max_seq_len_decoder=50, quantiles=None, use_grn_for_static=False, static_integration_mode='add_to_decoder_input', activation='relu', layer_norm_epsilon=1e-06, name='TimeSeriesTransformer' )¶
- classmethod from_config(config, custom_objects=None)[source]¶
Creates a layer from its config.
This method is the reverse of get_config, capable of instantiating the same layer from the config dictionary. It does not handle layer connectivity (handled by Network), nor weights (handled by set_weights).
- Parameters:
config – A Python dictionary, typically the output of get_config.
- Returns:
A layer instance.