fusionlab.nn.utils.create_sequences

fusionlab.nn.utils.create_sequences(df, sequence_length, target_col, step=1, include_overlap=True, drop_last=True, forecast_horizon=None, verbose=3)[source]

Create input sequences and corresponding targets for time series forecasting.

The create_sequences function generates sequences of features and their corresponding targets from a time series dataset. This is essential for training sequence models like Temporal Fusion Transformers, LSTMs, and others that rely on temporal dependencies.

See more in User Guide.

Parameters:
  • df (pandas.DataFrame) – The processed DataFrame containing features and the target variable.

  • sequence_length (int) – The number of past time steps to include in each input sequence.

  • target_col (str) – The name of the target column.

  • step (int, default 1) – The step size between the starts of consecutive sequences.

  • include_overlap (bool, default True) – Whether to include overlapping sequences based on the step size.

  • drop_last (bool, default True) – Whether to drop the last sequence if it does not have enough data points.

  • forecast_horizon (int, optional, default None) – The number of future time steps to predict. If set to None, the function will create targets for a single future time step. If provided, targets will consist of the next forecast_horizon time steps.

  • verbose (int, default 3) – Controls the verbosity of logging. Ranges from 0 (no logs) to 7 (maximal logs).

Returns:

A tuple containing:
  • sequences: Array of input sequences with shape (num_sequences, sequence_length, num_features).

  • targets:
    • If forecast_horizon is None: Array of target values with shape (num_sequences,).

    • If forecast_horizon is an integer: Array of target sequences with shape (num_sequences, forecast_horizon).

Return type:

Tuple[`numpy.ndarray`, numpy.ndarray]

Raises:

ValueError – If the DataFrame df does not contain the target_col.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from fusionlab.nn.utils import create_sequences
>>> # Create a dummy DataFrame
>>> data = pd.DataFrame({
...     'feature1': np.random.rand(100),
...     'feature2': np.random.rand(100),
...     'feature3': np.random.rand(100),
...     'target': np.random.rand(100)
... })
>>> # Create sequences for single-step forecasting
>>> sequence_length = 4
>>> sequences, targets = create_sequences(
...     df=data,
...     sequence_length=sequence_length,
...     target_col='target',
...     step=1,
...     include_overlap=True,
...     drop_last=True,
...     forecast_horizon=None
... )
>>> print(sequences.shape)
(95, 4, 4)
>>> print(targets.shape)
(95,)
>>> # Create sequences for multi-step forecasting (e.g., 3 steps)
>>> forecast_horizon = 3
>>> sequences, targets = create_sequences(
...     df=data,
...     sequence_length=4,
...     target_col='target',
...     step=1,
...     include_overlap=True,
...     drop_last=True,
...     forecast_horizon=3
... )
>>> print(sequences.shape)
(92, 4, 4)
>>> print(targets.shape)
(92, 3)

Notes

  • Sequence Creation: The function slides a window of size sequence_length across the DataFrame to create input sequences. Each sequence is associated with a target value or sequence of values that immediately follow the input sequence.

  • Forecast Horizon:
    • If forecast_horizon is None, the function creates targets for a single future time step.

    • If forecast_horizon is an integer H, the function creates targets consisting of the next H time steps.

  • Step Size: The step parameter controls the stride of the sliding window. A step of 1 results in overlapping sequences, while a larger step reduces overlap.

  • Handling Incomplete Sequences: If drop_last is set to False, the function includes the last sequence even if it doesn’t have enough data points to form a complete sequence or target.

  • Data Validation: The function utilizes are_all_frames_valid from fusionlab.core.checks to ensure the integrity of input DataFrame before processing and exist_features to verify the presence of the target column.

The sequences generation can be expressed as:

\[\begin{split}\text{For each sequence } i, \\ \mathbf{X}^{(i)} = \left[ \mathbf{x}_{i}, \mathbf{x}_{i+1}, \\ \dots, \mathbf{x}_{i+T-1} \right] \\ y^{(i)} = \begin{cases} \mathbf{x}_{i+T} & \text{if } \text{forecast\_horizon} = \text{None} \\ \left[ \mathbf{x}_{i+T}, \mathbf{x}_{i+T+1}, \dots, \\ \mathbf{x}_{i+T+H-1} \right] & \text{if } \text{forecast\_horizon} = H \end{cases}\end{split}\]
Where:
  • \(\mathbf{X}^{(i)}\) is the input sequence of length \(T\).

  • \(y^{(i)}\) is the target value(s) following the sequence.

See also

fusionlab.nn.utils.split_static_dynamic

Function to split sequences into static and dynamic inputs.

References