Forecast Data Formatting Utilities¶
After generating forecasts, the raw model output—often a long-format
DataFrame where each row represents a single time step—needs to be
reshaped and structured for effective analysis and visualization. The
fusionlab.utils.forecast_utils module provides specialized
functions for these data transformation tasks.
This guide covers the primary utilities for pivoting your forecast results from a long to a wide format.
Pivoting from Long to Wide Format (pivot_forecast_dataframe)¶
- API Reference:
pivot_forecast_dataframe()
This is the core utility for transforming time series forecast data. It converts a long-format DataFrame into a wide-format DataFrame.
Long Format: Each row represents a single observation at a specific time for a specific sample (e.g., one row for the forecast of “site A” at “step 1”, another for “site A” at “step 2”).
Wide Format: Each row represents a single unique sample, and the time steps are spread across the columns (e.g., one row for “site A” with columns like prediction_step1, prediction_step2, etc.).
This transformation is often necessary for certain types of analysis or for plotting libraries that expect data in a wide structure.
Key Parameters:
data: The input long-format DataFrame.
id_vars: The list of columns that uniquely identify each sample (e.g., [‘sample_idx’, ‘longitude’, ‘latitude’]). These remain as index-like columns in the output.
time_col: The name of the column whose values will become the new column headers (e.g., forecast_step or year).
value_prefixes: A list of base names (e.g., [‘subsidence’, ‘GWL’]) that the function uses to identify all the value columns that need to be pivoted.
Usage Example:
1import pandas as pd
2import numpy as np
3from fusionlab.utils.forecast_utils import pivot_forecast_dataframe
4
5# 1. Create a sample long-format DataFrame
6data = {
7 'sample_idx': [0, 0, 1, 1],
8 'coord_t': [2018.0, 2019.0, 2018.0, 2019.0],
9 'coord_x': [113.5, 113.5, 113.8, 113.8],
10 'subsidence_q50': [-8, -9, -13, -14],
11 'subsidence_actual': [-8.5, -8.5, -13.2, -13.2],
12 'GWL_q50': [1.2, 1.3, 2.2, 2.3],
13}
14df_long = pd.DataFrame(data)
15print("--- Original Long-Format DataFrame ---")
16print(df_long)
17
18# 2. Pivot the DataFrame to wide format
19df_wide = pivot_forecast_dataframe(
20 data=df_long,
21 id_vars=['sample_idx', 'coord_x'],
22 time_col='coord_t',
23 value_prefixes=['subsidence', 'GWL'],
24 static_actuals_cols=['subsidence_actual'],
25 # Treat the float time column as integer years for naming
26 time_col_is_float_year=True
27)
28
29# 3. Display the result
30print("\n--- Pivoted Wide-Format DataFrame ---")
31print(df_wide)
Expected Output:
--- Original Long-Format DataFrame ---
sample_idx coord_t coord_x subsidence_q50 subsidence_actual GWL_q50
0 0 2018.0 113.5 -8 -8.5 1.2
1 0 2019.0 113.5 -9 -8.5 1.3
2 1 2018.0 113.8 -13 -13.2 2.2
3 1 2019.0 113.8 -14 -13.2 2.3
--- Pivoted Wide-Format DataFrame ---
sample_idx coord_x subsidence_actual GWL_2018_q50 GWL_2019_q50 subsidence_2018_q50 subsidence_2019_q50
0 0 113.5 -8.5 1.2 1.3 -8 -9
1 1 113.8 -13.2 2.2 2.3 -13 -14
Smart Format Detection and Conversion (format_forecast_dataframe)¶
- API Reference:
format_forecast_dataframe()
This function is a high-level “smart wrapper” around the core pivoting utility. Its main purpose is to auto-detect the format of your DataFrame (either ‘long’ or ‘wide’) and then conditionally pivot it only if necessary.
This is useful in data pipelines where you might receive data in either format and want to ensure it conforms to a specific structure (typically ‘wide’) before the next step.
Usage Example:
1from fusionlab.utils.forecast_utils import format_forecast_dataframe
2
3# Use the same long-format DataFrame from the previous example
4df_long = pd.DataFrame({
5 'sample_idx': [0, 0, 1, 1],
6 'coord_t': [2018.0, 2019.0, 2018.0, 2019.0],
7 'subsidence_q50': [-8, -9, -13, -14],
8})
9
10# --- Case 1: Detect the format without converting ---
11detected_format = format_forecast_dataframe(df_long, to_wide=False)
12print(f"Detected format: '{detected_format}'")
13
14# --- Case 2: Ensure the output is wide format ---
15# Since the input is long, this will call the pivot function internally.
16df_wide_smart = format_forecast_dataframe(
17 df_long,
18 to_wide=True,
19 id_vars=['sample_idx'],
20 time_col='coord_t',
21 value_prefixes=['subsidence']
22)
23print("\n--- DataFrame after ensuring wide format ---")
24print(df_wide_smart)
Expected Output:
Detected format: 'long'
--- DataFrame after ensuring wide format ---
sample_idx subsidence_2018.0_q50 subsidence_2019.0_q50
0 0 -8 -9
1 1 -13 -14