fusionlab.nn.utils.create_sequences¶
- fusionlab.nn.utils.create_sequences(df, sequence_length, target_col, step=1, include_overlap=True, drop_last=True, forecast_horizon=None, verbose=3)[source]¶
Create input sequences and corresponding targets for time series forecasting.
The create_sequences function generates sequences of features and their corresponding targets from a time series dataset. This is essential for training sequence models like Temporal Fusion Transformers, LSTMs, and others that rely on temporal dependencies.
See more in User Guide.
- Parameters:
df (pandas.DataFrame) – The processed DataFrame containing features and the target variable.
sequence_length (int) – The number of past time steps to include in each input sequence.
target_col (str) – The name of the target column.
step (int, default 1) – The step size between the starts of consecutive sequences.
include_overlap (bool, default True) – Whether to include overlapping sequences based on the step size.
drop_last (bool, default True) – Whether to drop the last sequence if it does not have enough data points.
forecast_horizon (int, optional, default None) – The number of future time steps to predict. If set to None, the function will create targets for a single future time step. If provided, targets will consist of the next forecast_horizon time steps.
verbose (int, default 3) – Controls the verbosity of logging. Ranges from 0 (no logs) to 7 (maximal logs).
- Returns:
- A tuple containing:
sequences: Array of input sequences with shape (num_sequences, sequence_length, num_features).
- targets:
If forecast_horizon is None: Array of target values with shape (num_sequences,).
If forecast_horizon is an integer: Array of target sequences with shape (num_sequences, forecast_horizon).
- Return type:
Tuple[`numpy.ndarray`, numpy.ndarray]- Raises:
ValueError – If the DataFrame df does not contain the target_col.
Examples
>>> import pandas as pd >>> import numpy as np >>> from fusionlab.nn.utils import create_sequences
>>> # Create a dummy DataFrame >>> data = pd.DataFrame({ ... 'feature1': np.random.rand(100), ... 'feature2': np.random.rand(100), ... 'feature3': np.random.rand(100), ... 'target': np.random.rand(100) ... })
>>> # Create sequences for single-step forecasting >>> sequence_length = 4 >>> sequences, targets = create_sequences( ... df=data, ... sequence_length=sequence_length, ... target_col='target', ... step=1, ... include_overlap=True, ... drop_last=True, ... forecast_horizon=None ... ) >>> print(sequences.shape) (95, 4, 4) >>> print(targets.shape) (95,)
>>> # Create sequences for multi-step forecasting (e.g., 3 steps) >>> forecast_horizon = 3 >>> sequences, targets = create_sequences( ... df=data, ... sequence_length=4, ... target_col='target', ... step=1, ... include_overlap=True, ... drop_last=True, ... forecast_horizon=3 ... ) >>> print(sequences.shape) (92, 4, 4) >>> print(targets.shape) (92, 3)
Notes
Sequence Creation: The function slides a window of size sequence_length across the DataFrame to create input sequences. Each sequence is associated with a target value or sequence of values that immediately follow the input sequence.
- Forecast Horizon:
If forecast_horizon is None, the function creates targets for a single future time step.
If forecast_horizon is an integer H, the function creates targets consisting of the next H time steps.
Step Size: The step parameter controls the stride of the sliding window. A step of 1 results in overlapping sequences, while a larger step reduces overlap.
Handling Incomplete Sequences: If drop_last is set to False, the function includes the last sequence even if it doesn’t have enough data points to form a complete sequence or target.
Data Validation: The function utilizes are_all_frames_valid from fusionlab.core.checks to ensure the integrity of input DataFrame before processing and exist_features to verify the presence of the target column.
The sequences generation can be expressed as:
\[\begin{split}\\text{For each sequence } i, \\\\ \\mathbf{X}^{(i)} = \\left[ \\mathbf{x}_{i}, \\mathbf{x}_{i+1}, \\\\ \\dots, \\mathbf{x}_{i+T-1} \\right] \\\\ y^{(i)} = \\begin{cases} \\mathbf{x}_{i+T} & \\text{if } \\text{forecast\\_horizon} = \\text{None} \\\\ \\left[ \\mathbf{x}_{i+T}, \\mathbf{x}_{i+T+1}, \\dots, \\\\ \\mathbf{x}_{i+T+H-1} \\right] & \\text{if } \\text{forecast\\_horizon} = H \\end{cases}\end{split}\]- Where:
\(\\mathbf{X}^{(i)}\) is the input sequence of length \(T\).
\(y^{(i)}\) is the target value(s) following the sequence.
See also
fusionlab.nn.utils.split_static_dynamicFunction to split sequences into static and dynamic inputs.
References