Neural Network Utilities¶
The fusionlab.nn.utils module provides helpful functions for
working specifically with the neural network models in fusionlab-learn.
These utilities assist with tasks such as data preprocessing tailored
for models like TFT and XTFT, computing anomaly scores, generating
forecasts, and reshaping data arrays.
Anomaly Score Calculation¶
compute_anomaly_scores¶
- API Reference:
Purpose: To calculate anomaly scores for time series data using
various statistical or algorithmic methods. These scores quantify the
“unusualness” of data points or sequences and can be used to inform
model training (e.g., with the ‘from_config’ strategy in
XTFT) or for post-hoc analysis and evaluation.
Functionality / Methods:
This function computes scores based on the chosen method. Let \(y\)
denote a value from y_true, \(\mu\) its mean, \(\sigma\) its
standard deviation, and \(\epsilon\) a small constant:
‘statistical’ (or ‘stats’): Calculates scores based on the squared normalized deviation from the mean (squared Z-score). Higher scores indicate larger deviations.
\[Score(y) = \left(\frac{y - \mu}{\sigma + \epsilon}\right)^2\]‘domain’: Uses a user-provided callable domain_func(y) or a default heuristic (e.g., assigning higher scores to negative or zero values if only positive values are expected).
‘isolation_forest’ (or ‘if’): Uses the
sklearn.ensemble.IsolationForestalgorithm. Scores are derived from the negative average path length required to isolate a sample (e.g.,-iso.score_samples(y)). Lower original scores (more negative) indicate higher anomaly likelihood; the function may transform these. Requiressklearn.‘residual’: Requires providing corresponding predictions y_pred. Scores are based on the prediction error \(e = y_{true} - y_{pred}\):
‘mae’ sub-method: \(Score = |e|\)
‘mse’ sub-method: \(Score = e^2\)
‘rmse’ sub-method: \(Score = \sqrt{e^2 + \epsilon}\)
(Refer to the API documentation for details on parameters like `threshold`, `contamination`, `sklearn_params`, `error_metric` etc.)
Usage Context: This function is typically used outside the main
model training loop, for instance, to pre-calculate anomaly scores
from historical data, baseline model predictions, or domain rules.
These pre-calculated scores can then be passed to the
XTFT model via the anomaly_config parameter
when using the anomaly_detection_strategy='from_config'. It offers
a flexible way to define anomaly signals based on various approaches
before integrating them into an anomaly-aware training process.
Code Example:
1import numpy as np
2from fusionlab.nn.utils import compute_anomaly_scores
3# IsolationForest needed for 'if' method
4# from sklearn.ensemble import IsolationForest
5
6# Config
7batch_size = 4
8time_steps = 10
9features = 1
10
11# Dummy data (e.g., target values)
12y_true = np.random.randn(batch_size, time_steps, features).astype(np.float32)
13# Inject an anomaly
14y_true[1, 5, 0] = 10.0
15
16# Dummy predictions (for residual method)
17y_pred = y_true + np.random.normal(0, 0.5, y_true.shape).astype(np.float32)
18
19# 1. Calculate using 'statistical' method
20stat_scores = compute_anomaly_scores(
21 y_true=y_true,
22 method='statistical'
23)
24print("--- Statistical Scores ---")
25print(f"Input y_true shape: {y_true.shape}")
26print(f"Output scores shape: {stat_scores.shape}")
27# Expected shape: (4, 10, 1)
28print(f"Example score for anomalous point: {stat_scores[1, 5, 0]:.2f}")
29print(f"Example score for normal point: {stat_scores[0, 5, 0]:.2f}")
30
31# 2. Calculate using 'residual' (MAE) method
32resid_scores = compute_anomaly_scores(
33 y_true=y_true,
34 y_pred=y_pred,
35 method='residual',
36 error_metric='mae' # Use MAE for residuals
37)
38print("\n--- Residual (MAE) Scores ---")
39print(f"Input y_true shape: {y_true.shape}")
40print(f"Input y_pred shape: {y_pred.shape}")
41print(f"Output scores shape: {resid_scores.shape}")
42# Expected shape: (4, 10, 1)
43print(f"Example score for anomalous point: {resid_scores[1, 5, 0]:.2f}")
44print(f"Example score for normal point: {resid_scores[0, 5, 0]:.2f}")
45
46# 3. Calculate using 'isolation_forest' (requires sklearn)
47# Need to reshape data for Isolation Forest (Samples, Features)
48# For time series, might apply IF per timestep or on sequence features
49# Example: Apply per timestep (treat each B*T point independently)
50# try:
51# y_true_flat = y_true.reshape(-1, features)
52# if_scores_flat = compute_anomaly_scores(
53# y_true=y_true_flat,
54# method='isolation_forest',
55# contamination=0.05 # Expected anomaly rate
56# )
57# if_scores = if_scores_flat.reshape(batch_size, time_steps, features)
58# print("\n--- Isolation Forest Scores ---")
59# print(f"Output scores shape: {if_scores.shape}")
60# except ImportError:
61# print("\nSkipping Isolation Forest example (sklearn not found).")
Data Preparation & Preprocessing¶
These functions help prepare raw time series data into the specific formats expected by models like TFT and XTFT.
split_static_dynamic¶
- API Reference:
Purpose: To separate an input array containing sequences of combined features into two distinct arrays: one for static (time-invariant extracted from a single time step) features and one for dynamic (time-varying) features. This is often needed when a simpler sequence generation tool creates a combined array first.
Functionality: Given an input sequence tensor \(\mathbf{X} \in \mathbb{R}^{B \times T \times N}\) (Batch, TimeSteps, NumCombinedFeatures), static feature indices \(I_{static}\), dynamic feature indices \(I_{dynamic}\), and a specific time step \(t_{static}\) (usually 0) for extracting static values:
Extract Static Features: Selects features \(I_{static}\) at time step \(t_{static}\).
\[\mathbf{S}_{raw} = \mathbf{X}_{:, t_{static}, I_{static}} \in \mathbb{R}^{B \times |I_{static}|}\]Extract Dynamic Features: Selects features \(I_{dynamic}\) across all time steps \(T\).
\[\mathbf{D}_{raw} = \mathbf{X}_{:, :, I_{dynamic}} \in \mathbb{R}^{B \times T \times |I_{dynamic}|}\]Reshape (Optional): If
reshape_staticorreshape_dynamicare True (default), adds a trailing dimension of 1:\(\mathbf{S} \in \mathbb{R}^{B \times |I_{static}| \times 1}\)
\(\mathbf{D} \in \mathbb{R}^{B \times T \times |I_{dynamic}| \times 1}\)
Usage Context: Use this function after creating combined sequences
(e.g., using create_sequences() on a DataFrame containing both
static and dynamic columns) when you need to separate them into the
distinct static and dynamic input arrays required by models like
TemporalFusionTransformer or
DummyTFT. It assumes static
values are repeated across time in the input sequence.
Code Example:
1import numpy as np
2# Assuming split_static_dynamic is importable
3from fusionlab.nn.utils import split_static_dynamic
4
5# Config
6B, T, N = 4, 10, 5 # Batch, Time, Features (2 static, 3 dynamic)
7static_indices = [0, 1]
8dynamic_indices = [2, 3, 4]
9static_timestep_idx = 0 # Extract static from first step
10
11# Dummy combined sequence input
12combined_sequences = np.random.rand(B, T, N).astype(np.float32)
13
14# Split the sequences
15static_array, dynamic_array = split_static_dynamic(
16 X=combined_sequences,
17 static_features_indices=static_indices,
18 dynamic_features_indices=dynamic_indices,
19 static_timestep=static_timestep_idx,
20 reshape_static=True, # Default
21 reshape_dynamic=True # Default
22)
23
24print(f"Input combined sequence shape: {combined_sequences.shape}")
25print(f"Output static array shape: {static_array.shape}")
26print(f"Output dynamic array shape: {dynamic_array.shape}")
27# Expected: (4, 10, 5), (4, 2, 1), (4, 10, 3, 1)
create_sequences¶
- API Reference:
Purpose: To transform a time series dataset (typically in a Pandas DataFrame) into a format suitable for supervised learning with sequence models. It creates input sequences (windows of past data, including all available features) and their corresponding target values (future data to predict from a specific column).
Functionality: This function slides a window of a specified sequence_length (\(T\)) across the input DataFrame df. For each window, it extracts:
Input Sequence (\(\mathbf{X}^{(i)}\)): A segment of the DataFrame containing all feature columns over \(T\) consecutive time steps starting at index \(i\).
\[\mathbf{X}^{(i)} = [\mathbf{df}_{i}, \mathbf{df}_{i+1}, ..., \mathbf{df}_{i+T-1}]\]Target Value(s) (\(y^{(i)}\)): The value(s) from the specified target_col that occur immediately after the input sequence.
Single-step (forecast_horizon=None or 1): Target is \(\text{target}_{\text{value},\, i+T}\).
Multi-step (forecast_horizon=H): Target is the sequence \([\text{target}_{\text{value},\, i+T}, \ldots, \text{target}_{\text{value},\, i+T+H-1}]\).
The function iterates through the DataFrame with a given step size (stride=1 creates overlapping sequences). The drop_last parameter controls handling of sequences near the end without full targets.
Output: Returns two NumPy arrays:
sequences: Shape \((\text{NumSeq}, T, \text{NumFeatures})\)
targets: Shape \((\text{NumSeq},)\) for single-step or \((\text{NumSeq}, H)\) for multi-step.
Usage Context: A fundamental preprocessing step. Use it after
cleaning and feature engineering your DataFrame to generate the
(X, y) pairs needed to train basic sequence models or as an
intermediate step before further processing (like using
split_static_dynamic()) for more complex models like TFT/XTFT.
It’s simpler than reshape_xtft_data() as it doesn’t automatically
separate static/dynamic/future types.
Code Example:
1import numpy as np
2import pandas as pd
3from fusionlab.nn.utils import create_sequences
4
5# Dummy DataFrame
6data = {
7 'Time': pd.to_datetime(pd.date_range('2023-01-01', periods=50)),
8 'Feature1': np.random.rand(50) * 10,
9 'Target': np.sin(np.arange(50) * 0.5) + 5
10}
11df = pd.DataFrame(data).set_index('Time')
12
13# Config
14sequence_length = 10 # Lookback window
15forecast_horizon = 5 # Predict 5 steps ahead
16target_column = 'Target'
17
18# Create sequences and multi-step targets
19X, y = create_sequences(
20 df=df,
21 sequence_length=sequence_length,
22 forecast_horizon=forecast_horizon,
23 target_col=target_column,
24 step=1 # Default stride
25)
26
27print(f"Original DataFrame shape: {df.shape}")
28print(f"Output sequences (X) shape: {X.shape}")
29print(f"Output targets (y) shape: {y.shape}")
30# Expected shapes (approx): (50, 2), (36, 10, 2), (36, 5)
compute_forecast_horizon¶
- API Reference:
Purpose: To determine the number of time steps (forecast_horizon) between a specified prediction start date/time and end date/time, optionally using the inferred frequency of provided time series data.
Functionality:
Frequency Inference: Optionally infers the time series frequency (e.g., ‘D’, ‘H’, ‘MS’) from input data using pandas.infer_freq.
Date Parsing: Converts start_pred and end_pred (strings, datetimes, or integer years) into pandas Timestamps.
Horizon Calculation:
With Frequency: Calculates the number of steps by generating
a date range between start and end using the inferred frequency.
Without Frequency: Estimates the horizon based on the time
delta in the largest applicable unit (years, months, weeks, days).
Usage Context: Useful before creating sequences or configuring models
when the forecast period is defined by start/end dates rather than a
fixed number of steps. Helps ensure consistency between the desired
prediction range and parameters like forecast_horizon used in
create_sequences() or model initialization.
Code Example:
1import pandas as pd
2from fusionlab.nn.utils import compute_forecast_horizon
3
4# Example 1: Using frequency inference
5dates = pd.date_range('2023-01-01', periods=60, freq='D')
6df_daily = pd.DataFrame({'date': dates})
7horizon1 = compute_forecast_horizon(
8 data=df_daily, # Provide data to infer frequency
9 dt_col='date',
10 start_pred='2023-03-01',
11 end_pred='2023-03-10'
12)
13print(f"Horizon (daily data, Mar 1 to Mar 10): {horizon1}")
14# Expected: 10
15
16# Example 2: Using integer years (no frequency)
17horizon2 = compute_forecast_horizon(
18 start_pred=2024,
19 end_pred=2026
20 # No data/freq provided, calculates based on years
21)
22print(f"Horizon (years 2024 to 2026): {horizon2}")
23# Expected: 3 (2024, 2025, 2026)
prepare_spatial_future_data¶
- API Reference:
Purpose: To prepare the specific static and dynamic input arrays needed to generate out-of-sample future predictions using a trained sequence model, particularly designed for datasets structured with spatial groupings (e.g., locations, sensors).
Functionality: Processes a DataFrame containing historical data, grouped by location, to construct model inputs for time steps beyond the training data range.
Grouping & Sorting: Groups data by spatial_cols and sorts by time (dt_col).
Last Sequence Extraction: Extracts the most recent sequence (length sequence_length) for each group/location.
Input Preparation: From the last sequence, it extracts:
Static Inputs: Values from static_feature_names.
Dynamic Inputs: Values from dynamic_feature_indices. This forms a template for future dynamic inputs.
Future Time Step Projection: For each required future step (up to forecast_horizon, based on future_years):
It updates the time feature(s) within the dynamic template
sequence to reflect the future time step, potentially applying inverse scaling if necessary using provided scaling_params (\(\mu, \sigma\)). Other dynamic features are usually carried forward from the last known state.
\[\text{scaled}_{\text{future},\, \text{time}} = \frac{\text{future}_{\text{time}} - \mu_{\text{time}}}{\sigma_{\text{time}} + \epsilon}\]
Output: Returns prepared NumPy arrays for static and future dynamic inputs, ready for the model’s .predict() method, along with metadata like future time steps and location identifiers.
Usage Context: This is a crucial function for generating actual forecasts after a model has been trained. Use it to create the input arrays needed to predict future values not seen during training, based on the last available historical data for each spatial group. See the CLI tools (Command-Line Interface (CLI)) or forecasting examples (Examples Gallery) for contextual usage. (A direct code example here would require significant setup; refer to full workflow examples).
reshape_xtft_data¶
- API Reference:
Purpose: A comprehensive utility to transform a time series DataFrame into the structured sequence format with separate arrays for static, dynamic (past), future (known), and target features, as required for training and evaluating models like XTFT and TFT.
Functionality:
Validation & Grouping: Validates inputs, handles datetime column, optionally groups by spatial_cols, sorts by time.
Rolling Window: Slides a window (length time_steps + forecast_horizon) across the data within each group.
Feature Extraction per Window: For each window, it extracts and separates sequences for:
Static Features: From static_cols (value usually taken once per group).
Dynamic Features: From dynamic_cols for the lookback period (time_steps).
Future Features: From future_cols for a period relevant to the model
(often lookback + horizon). (Note: Verify precise time window used for future features based on implementation/model needs).
Target Features: From target_col for the forecast period (forecast_horizon).
Output: Returns a tuple of NumPy arrays: (static_data, dynamic_data, future_data, target_data). Arrays for optional inputs (static/future) will be None if no corresponding columns are provided. Static data typically has shape \((NumSeq, NumStatic)\), while others are 3D: \((NumSeq, Time, Features)\).
Mathematical Concept (Rolling Window): Generates pairs of input sequences (\(\mathbf{X}^{(i)}\), possibly split by type) and target sequences (\(\mathbf{Y}^{(i)}\)) for supervised training.
Usage Context: This is the recommended primary tool for preparing datasets directly from DataFrames for training or evaluating fusionlab’s TFT and XTFT models. It handles sequence creation, feature type separation, and spatial grouping in one step, producing the exact array formats needed by the models’ call methods. See examples like Advanced Forecasting with XTFT for usage. (A direct code example here would be very similar to the full model examples; refer to those for context).
Forecasting & Visualization¶
These functions assist with generating predictions from trained models and visualizing the forecast results.
generate_forecast¶
- API Reference:
Purpose: To generate future predictions using a pre-trained
fusionlab model (like XTFT or
TFT). This function acts as a
high-level wrapper that handles preparing the necessary model inputs
from the end of the provided training data and formats the model’s
output into a structured DataFrame.
Functionality:
Model Validation: Ensures
xtft_modelis a valid Keras model.Input Preparation: Groups
train_databyspatial_cols(if provided). For each group, extracts the last sequence of lengthtime_stepsand constructs the input arrays[X_static, X_dynamic, X_future]needed for prediction, using logic similar toprepare_spatial_future_data().Prediction: Calls
xtft_model.predict()with the prepared arrays for each group. Conceptually:\[\hat{\mathbf{y}}_{t+1...t+H} = f_{model}(\mathbf{X}_{\text{static}}, \mathbf{X}_{\text{dynamic}}, \mathbf{X}_{\text{future}})\]where \(H\) is the
forecast_horizon.Output Formatting: Organizes predictions into a Pandas DataFrame, including spatial identifiers and forecast dates/periods (
forecast_dt). Creates columns for point predictions (<tname>_pred) or quantile predictions (<tname>_qXX).Evaluation (Optional): If
test_datais provided, aligns forecasts with actuals and calculates/prints R² and Coverage Scores for the overlapping periods within the horizon.Saving (Optional): Saves the forecast DataFrame if
savefileis specified.
Usage Context: This is the primary function for generating out-of-sample forecasts after model training. It simplifies input preparation based on historical data and structures the results. See the CLI tools (Command-Line Interface (CLI)) or forecasting examples (Examples Gallery) for contextual usage.
visualize_forecasts¶
- API Reference:
Purpose: To create visualizations comparing forecasted values against actual values (if available), particularly useful for spatial data or analyzing performance across different time periods.
Functionality:
Data Filtering: Selects data for specified
eval_periodsfromforecast_dfand optionaltest_data.Column Identification: Determines prediction, actual, and coordinate column names.
Plot Grid Setup: Creates a matplotlib grid showing actual vs. predicted plots for each period.
Plotting: Generates scatter plots for actuals (if available) and predictions, colored by value, using specified coordinates. Applies consistent colormap and range. Adds titles, labels, colorbars.
Display: Shows the matplotlib figure.
Usage Context: Use this after generating forecasts (e.g., via
generate_forecast()) to visually inspect spatial patterns, compare
predictions to actuals over time, or assess quantile forecast spreads.
See the forecasting examples (Examples Gallery) for
contextual usage. (A direct code example here requires forecast data;
refer to full workflow examples).
forecast_single_step¶
- API Reference:
Purpose: To generate a forecast for only the next single time step
(\(H=1\)) using a pre-trained fusionlab model and
pre-prepared input arrays.
Functionality:
Input: Takes a validated Keras
xtft_modelandinputs(a list/tuple [X_static, X_dynamic, X_future]).Prediction: Calls
xtft_model.predict(inputs). Assumes the model outputs multiple horizon steps and extracts the prediction for the first step (\(t+1\)).Output Formatting: Creates a Pandas DataFrame including spatial columns (if specified), optional datetime, optional actuals (y), and prediction columns (
<tname>_predor<tname>_qXX).Masking (Optional): Masks predictions based on mask_values in y.
Evaluation (Optional): Calculates R²/Coverage if y provided.
Saving (Optional): Saves DataFrame if savefile specified.
Usage Context: Use when you only need the immediate next prediction and have already manually prepared the required model input arrays (X_static, X_dynamic, X_future). Useful in scenarios like real-time single-step updates or when integrating into systems where input preparation is handled separately.
Code Example:
1import numpy as np
2import pandas as pd
3import tensorflow as tf
4from fusionlab.nn.utils import forecast_single_step
5# from fusionlab.nn.transformers import XTFT
6
7# Dummy Model
8class DummyModel(tf.keras.Model):
9 def __init__(self, horizon=1, num_outputs=1):
10 super().__init__()
11 self.horizon = horizon
12 self.num_outputs=num_outputs
13 # Dummy layer to ensure model is callable
14 self.dense = tf.keras.layers.Dense(horizon * num_outputs)
15 def call(self, inputs):
16 # Simulate output shape (B, H, O) or (B, H, Q)
17 batch_size = tf.shape(inputs[1])[0] # Get from dynamic
18 # Flatten and project to simulate processing
19 flat_in = tf.keras.layers.Flatten()(inputs[1][:,-1,:]) # Use last step dynamic
20 out_flat = self.dense(flat_in) # Shape (B, H*O)
21 return tf.reshape(out_flat, (batch_size, self.horizon, self.num_outputs))
22
23# Config & Dummy Data
24B, T, H_model = 4, 12, 6 # Model trained for H=6
25D_dyn, D_stat, D_fut = 5, 3, 2
26static_in = tf.random.normal((B, D_stat))
27dynamic_in = tf.random.normal((B, T, D_dyn))
28future_in = tf.random.normal((B, T + H_model, D_fut)) # Future for model call
29# Dummy target for NEXT SINGLE STEP ONLY (H=1)
30y_true_single = tf.random.normal((B, 1)) # Needs to match output dim (usually 1)
31
32# Instantiate Dummy Model (trained for H=6, output O=1)
33model = DummyModel(horizon=H_model, num_outputs=1)
34# Dummy call to build model
35_ = model([static_in, dynamic_in, future_in])
36
37# Prepare inputs for forecast_single_step
38model_inputs = [static_in, dynamic_in, future_in]
39
40# Generate single step forecast
41forecast_df = forecast_single_step(
42 xtft_model=model,
43 inputs=model_inputs,
44 target_col='Value',
45 mode='point', # Point forecast
46 y=y_true_single, # Provide actuals for next step
47 evaluate=True, # Ask for evaluation
48 spatial_cols=['ID'], # Assume static_in had ID column (needs adjustment)
49 # dt_col='Timestamp' # Optional
50)
51
52print("\n--- Single Step Forecast ---")
53print(forecast_df.head())
forecast_multi_step¶
- API Reference:
Purpose: To generate forecasts for multiple future time steps
(up to a specified forecast_horizon) using a pre-trained
fusionlab model and pre-prepared input arrays.
Functionality:
Input: Takes
xtft_model,inputs = [X_s, X_d, X_f], andforecast_horizon.Prediction: Calls
xtft_model.predict(inputs), expecting an output covering the full horizon (shape \((B, H, Outputs)\)).Output Formatting (Wide -> Long): Organizes the multi-step predictions. It first creates a wide-format DataFrame (columns like <tname>_pred_step1, <tname>_pred_step2, etc.) using an internal BatchDataFrameBuilder, then likely converts it to a long format using
step_to_long(), where each row represents a specific sample, forecast step, and prediction value.Masking (Optional): Masks predictions based on mask_values in y.
Evaluation (Optional): Calculates R²/Coverage across all horizon steps if y (with shape \((B, H, O)\)) is provided.
Saving (Optional): Saves the final DataFrame if savefile given.
Usage Context: Use when you need multi-step forecasts based on a specific set of pre-prepared input arrays. It handles the organization of the model’s multi-step output into a structured DataFrame.
Code Example:
1import numpy as np
2import pandas as pd
3import tensorflow as tf
4# Assuming forecast_multi_step and a dummy model class are available
5from fusionlab.nn.utils import forecast_multi_step
6# from fusionlab.nn import XTFT # Replace with your actual model class
7
8# Dummy Model (same as single-step example)
9class DummyModel(tf.keras.Model):
10 def __init__(self, horizon=1, num_outputs=1):
11 super().__init__()
12 self.horizon = horizon; self.num_outputs = num_outputs
13 self.dense = tf.keras.layers.Dense(horizon * num_outputs)
14 def call(self, inputs):
15 batch_size = tf.shape(inputs[1])[0]
16 flat_in = tf.keras.layers.Flatten()(inputs[1][:,-1,:])
17 out_flat = self.dense(flat_in)
18 return tf.reshape(out_flat, (batch_size, self.horizon, self.num_outputs))
19
20# Config & Dummy Data
21B, T, H = 4, 12, 6 # Horizon H=6
22D_dyn, D_stat, D_fut = 5, 3, 2
23output_dim = 1
24static_in = tf.random.normal((B, D_stat))
25dynamic_in = tf.random.normal((B, T, D_dyn))
26future_in = tf.random.normal((B, T + H, D_fut)) # Future for model call
27# Dummy target for MULTIPLE steps (H=6)
28y_true_multi = tf.random.normal((B, H, output_dim))
29
30# Instantiate Dummy Model (trained for H=6, output O=1)
31model = DummyModel(horizon=H, num_outputs=output_dim)
32_ = model([static_in, dynamic_in, future_in]) # Build
33
34# Prepare inputs for forecast_multi_step
35model_inputs = [static_in, dynamic_in, future_in]
36
37# Generate multi-step forecast
38forecast_df_multi = forecast_multi_step(
39 xtft_model=model,
40 inputs=model_inputs,
41 target_col='Value',
42 forecast_horizon=H, # Specify horizon
43 mode='point',
44 y=y_true_multi, # Provide multi-step actuals
45 evaluate=True,
46 spatial_cols=['ID'], # Assume static_in had ID
47 # dt_col='Timestamp' # Optional
48)
49
50print("\n--- Multi Step Forecast (Long Format) ---")
51print(forecast_df_multi.head()) # Display long format
generate_forecast_with¶
- API Reference:
Purpose: A convenient wrapper function that automatically calls
either forecast_single_step() or forecast_multi_step()
based on the specified forecast_horizon.
Functionality:
Takes all the same arguments as
forecast_single_step()andforecast_multi_step().Checks
forecast_horizon:If
forecast_horizon == 1, callsforecast_single_step().If
forecast_horizon > 1, callsforecast_multi_step().
Returns the DataFrame produced by the called function.
Usage Context: Provides a unified interface for generating forecasts from pre-prepared input arrays, regardless of whether you need one step or multiple steps ahead. Simplifies workflows where the forecast length might be a variable parameter.
Code Example:
1import numpy as np
2import pandas as pd
3import tensorflow as tf
4# Assuming generate_forecast_with and dummy model are available
5from fusionlab.nn.utils import generate_forecast_with
6# from my_models import DummyModel # Use same dummy model as above
7
8# Use Dummy Model and Data from previous examples
9B, T, H = 4, 12, 6
10D_dyn, D_stat, D_fut = 5, 3, 2
11output_dim = 1
12static_in = tf.random.normal((B, D_stat))
13dynamic_in = tf.random.normal((B, T, D_dyn))
14future_in = tf.random.normal((B, T + H, D_fut))
15y_true_multi = tf.random.normal((B, H, output_dim))
16model = DummyModel(horizon=H, num_outputs=output_dim)
17_ = model([static_in, dynamic_in, future_in]) # Build
18model_inputs = [static_in, dynamic_in, future_in]
19
20# Example 1: Generate single step (H=1 passed implicitly)
21print("--- generate_forecast_with (H=1) ---")
22df_single = generate_forecast_with(
23 xtft_model=model,
24 inputs=model_inputs,
25 target_col='Value',
26 # forecast_horizon=1 # (Default or set to 1)
27 mode='point',
28 y=y_true_multi[:, :1, :] # Provide only first step actuals
29)
30print(df_single.head())
31
32# Example 2: Generate multi step
33print("\n--- generate_forecast_with (H=6) ---")
34df_multi = generate_forecast_with(
35 xtft_model=model,
36 inputs=model_inputs,
37 target_col='Value',
38 forecast_horizon=H, # Explicitly set > 1
39 mode='point',
40 y=y_true_multi
41)
42print(df_multi.head())
Data Reshaping Utilities¶
These functions assist in transforming data between different formats commonly encountered in multi-step time series forecasting workflows.
step_to_long¶
- API Reference:
Purpose: To transform a DataFrame containing multi-step forecast results from a “wide” format into a “long” format. In the wide format, each forecast step typically occupies separate columns (e.g., target_q50_step1, target_q50_step2). The long format reshapes this so that each row represents a single prediction for a specific sample (identified by original index or identifier columns), time step into the future, and possibly quantile level.
Functionality:
Takes a wide-format DataFrame df as input, along with metadata like tname (target variable base name), dt_col (datetime/period column), spatial_cols (identifier columns), and mode (‘quantile’ or ‘point’).
Identifies the columns corresponding to different forecast steps and quantiles based on naming conventions. It typically looks for patterns like _stepX and _qYY appended to the tname.
Uses internal helper functions (likely employing Pandas melting, stacking, or pivoting operations) to unpivot the step-based columns.
Reshapes the data, creating new columns usually named ‘step’ (for the forecast horizon step) and potentially ‘quantile’. It consolidates the prediction values into a single column (e.g., ‘predicted_value’).
Identifier columns (dt_col, spatial_cols), and any actual target value columns present in the wide DataFrame are typically preserved and duplicated appropriately across the newly created long-format rows.
Optionally sorts the final long-format DataFrame based on identifiers and step.
Usage Context: This function is primarily used as an internal
helper within forecast_multi_step() to convert the initially
generated wide-format predictions into a more standardized long format,
which is often easier for plotting or subsequent analysis (e.g.,
evaluating performance per step). Users might also find it useful if
they have wide-format forecast data from other sources and want to
reshape it.
Code Example:
1import pandas as pd
2import numpy as np
3from fusionlab.nn.utils import step_to_long
4
5# 1. Create Dummy Wide-Format DataFrame (simulating output)
6# (e.g., as might be initially created by forecast_multi_step)
7data_wide = {
8 'ID': [1, 1, 2, 2],
9 'ForecastStartDate': pd.to_datetime(['2023-01-01', '2023-01-02',
10 '2023-01-01', '2023-01-02']),
11 'Actual_step1': [10, 11, 20, 21],
12 'Actual_step2': [12, 13, 22, 23],
13 # Point predictions
14 'Value_pred_step1': [9.8, 11.2, 19.5, 21.3],
15 'Value_pred_step2': [11.5, 13.1, 21.8, 23.2],
16 # Quantile predictions
17 'Value_q10_step1': [8.8, 10.2, 18.5, 20.3],
18 'Value_q50_step1': [9.8, 11.2, 19.5, 21.3], # Same as point
19 'Value_q90_step1': [10.8, 12.2, 20.5, 22.3],
20 'Value_q10_step2': [10.5, 12.1, 20.8, 22.2],
21 'Value_q50_step2': [11.5, 13.1, 21.8, 23.2], # Same as point
22 'Value_q90_step2': [12.5, 14.1, 22.8, 24.2],
23}
24df_wide = pd.DataFrame(data_wide)
25print("--- Original Wide DataFrame ---")
26print(df_wide)
27
28# 2. Convert Point Forecast Columns to Long Format
29df_long_point = step_to_long(
30 df=df_wide.drop(columns=[c for c in df_wide if '_q' in c]), # Keep only pred cols
31 tname='Value',
32 dt_col='ForecastStartDate',
33 mode='point',
34 spatial_cols=['ID'],
35 pred_colname='Value_pred' # Name for the prediction column
36)
37print("\n--- Long Format DataFrame (Point Mode) ---")
38print(df_long_point)
39
40# 3. Convert Quantile Forecast Columns to Long Format
41df_long_quantile = step_to_long(
42 df=df_wide.drop(columns=[c for c in df_wide if '_pred_' in c]), # Keep only quantile cols
43 tname='Value',
44 dt_col='ForecastStartDate',
45 mode='quantile',
46 spatial_cols=['ID'],
47 pred_colname='Value_quantile' # Name for the prediction column
48)
49print("\n--- Long Format DataFrame (Quantile Mode) ---")
50print(df_long_quantile)