fusionlab.utils.widen_temporal_columns¶
- fusionlab.utils.widen_temporal_columns(data, dt_col, spatial_cols=None, target_name=None, round_dt=True, ignore_cols=None, nan_op=None, nan_thresh=None, savefile=None, verbose=0)[source]¶
Convert a long PIHALNet prediction table into a wide format where each temporal slice becomes a dedicated column.
The routine pivots columns whose names follow the pattern
<base> deterministic forecast <base>_qXX quantile forecast (e.g., ``subsidence_q10``) <base>_actual ground‑truth column
and produces columns of the form
<base>_<year> point forecast <base>_<year>_qXX quantile forecast <base>_<year>_actual ground‑truth value
If duplicate
(spatial, year)pairs are found, values are aggregated with :pyfunc:`pandas.Series.groupby(mean) <pandas.core.series.Series.groupby>` prior to pivoting to avoid “Index contains duplicate entries” errors.- Parameters:
data (
PathLike objectorpandas.DataFrame) – Long‑format DataFrame returned by :pyfunc:`fusionlab.utils.format_pihalnet_predictions`.dt_col (
str) – Column holding the temporal coordinate (e.g.,'coord_t'). Must be numeric or datetime‑coercible. When round_dt is True, values are rounded to integers.spatial_cols (
(str,str)orNone, defaultNone) – Names of x and y spatial coordinates. These are retained as leading columns in the output. If None, the function falls back to'sample_idx'or an auto‑generated'row_id'.target_name (
strorNone, defaultNone) – Restrict pivoting to a specific base (e.g.,'subsidence'). When None every base present in df is widened.round_dt (
bool, defaultTrue) – Round dt_col to the nearest integer (helpful for fractional years such as 2020.0001).ignore_cols (
list[str]orNone, defaultNone) – Additional columns to carry through unchanged. Values are propagated per spatial location using the first non‑null entry.nan_op (
{'drop', 'fill', 'both', None}, defaultNone) –Strategy for NaN handling after pivot:
'fill'– forward‑fill then back‑fill missing values.'drop'– drop rows containing NaNs (see nan_thresh).'both'– fill then drop according to nan_thresh.None– leave NaNs untouched.
nan_thresh (
floatorNone, defaultNone) –When nan_op contains
'drop', rows are dropped if the proportion of missing values exceeds nan_thresh. Set nan_thresh = 0 to require no NaNs, 0.5 to allow ≤ 50 % missing, etc.\[\text{row kept} \;\Longleftrightarrow\; \frac{\text{NaNs in row}}{\text{row width}} \le \text{nan\_thresh}\]savefile (
str, optional) – If a file path is provided, the final wide-format DataFrame will be saved as a CSV file.verbose (
int, default0) – Diagnostic verbosity from 0 (silent) to 5 (trace every step).
- Returns:
Wide‑format frame with spatial identifiers first, followed by year‑wise forecast, quantile, and actual columns.
- Return type:
pandas.DataFrame- Raises:
KeyError – dt_col missing from df or spatial_cols absent.
ValueError – No columns match target_name or nan_thresh is outside \([0, 1]\).
Notes
Duplicate indices are aggregated with the arithmetic mean before pivoting. Modify the aggregation lambda inside the function for alternative choices.
If ignore_cols is provided, their first non‑null value per spatial location is appended to the output.
Examples
Minimal usage on a tiny synthetic set
>>> import pandas as pd >>> from fusionlab.utils.data_utils import widen_temporal_columns >>> >>> df_long = pd.DataFrame( ... { ... "coord_x": [113.15, 113.15, 113.15, 113.15], ... "coord_y": [22.63, 22.63, 22.63, 22.63], ... "coord_t": [2019, 2020, 2019, 2020], ... "subsidence_q50": [0.09, 0.10, 0.12, 0.13], ... "subsidence_actual": [0.08, 0.11, 0.10, 0.14], ... } ... ) >>> >>> wide = widen_temporal_columns( ... df_long, ... dt_col="coord_t", ... spatial_cols=("coord_x", "coord_y"), ... verbose=2, ... ) [INFO] Initial rows: 4, columns: 2 [INFO] Widening base 'subsidence' (2 columns) [DONE] Final wide shape: (1, 4) >>> wide coord_x coord_y subsidence_2019_actual subsidence_2020_actual \ 0 113.15 22.63 0.08 0.11
subsidence_2019_q50 subsidence_2020_q50
0 0.12 0.13
End‑to‑end example with NaN handling, ignored columns, and two targets
>>> import numpy as np >>> rng = pd.date_range("2018", periods=3, freq="Y").year >>> n = 5 # five spatial locations >>> >>> # build synthetic long DataFrame >>> df_long = pd.DataFrame( ... { ... "sample_idx": np.repeat(np.arange(n), len(rng)), ... "coord_x": np.repeat(np.linspace(113.4, 113.5, n), len(rng)), ... "coord_y": np.repeat(np.linspace(22.1, 22.2, n), len(rng)), ... "coord_t": np.tile(rng, n), ... "region": np.repeat(["A", "B", "A", "B", "A"], len(rng)), ... "subsidence_q10": np.random.rand(n * len(rng)), ... "subsidence_q50": np.random.rand(n * len(rng)), ... "subsidence_q90": np.random.rand(n * len(rng)), ... "subsidence_actual": np.random.rand(n * len(rng)), ... "GWL_q50": np.random.rand(n * len(rng)), ... } ... ) >>> >>> # introduce NaNs for demonstration >>> df_long.loc[df_long.sample(frac=0.2).index, "subsidence_q50"] = np.nan >>> >>> wide = widen_temporal_columns( ... df_long, ... dt_col="coord_t", ... spatial_cols=("coord_x", "coord_y"), ... ignore_cols=["region"], ... target_name=None, # widen both 'subsidence' and 'GWL' ... nan_op="both", # fill then drop rows with many NaNs ... nan_thresh=0.4, # allow at most 40 % missing ... verbose=3, ... ) [INFO] Initial rows: 15, columns: 7 [INFO] Widening base 'GWL' (1 columns) └─ 0 duplicate rows in 'GWL_q50' → aggregated [INFO] Widening base 'subsidence' (4 columns) └─ 0 duplicate rows in 'subsidence_q10' → aggregated └─ 0 duplicate rows in 'subsidence_q50' → aggregated └─ 0 duplicate rows in 'subsidence_q90' → aggregated └─ 0 duplicate rows in 'subsidence_actual' → aggregated [INFO] Missing values filled (ffill+bfill). [INFO] Rows with >40% NaN dropped. [DONE] Final wide shape: (5, 19) >>> wide.iloc[:2, :8] # show first 8 columns coord_x coord_y GWL_2018_q50 GWL_2019_q50 GWL_2020_q50 \ 0 113.400 ... ... ... ... 1 113.425 ... ... ... ...
subsidence_2018_actual subsidence_2019_actual subsidence_2020_actual
0 … … … 1 … … …
See also
pandas.DataFrame.unstackCore pivoting method used internally.
fusionlab.plot.forecast.forecast_viewVisualisation routine that consumes the resulting wide frame.