fusionlab.utils.pivot_forecast_dataframe

fusionlab.utils.pivot_forecast_dataframe(data, id_vars, time_col, value_prefixes, static_actuals_cols=None, time_col_is_float_year='auto', round_time_col=False, verbose=0, savefile=None, _logger=None, **kws)[source]

Transforms a long-format forecast DataFrame to a wide format.

This utility reshapes time series prediction data from a “long” format, where each row represents a single time step for a given sample, to a “wide” format, where each row represents a single sample and columns correspond to values at different time steps.

Parameters:
  • data (pd.DataFrame) – The input long-format DataFrame. It must contain the columns specified in id_vars and time_col, as well as value columns that start with the strings in value_prefixes.

  • id_vars (list of str) – A list of column names that uniquely identify each sample or group. These columns will be preserved in the wide-format output. For example: ['sample_idx', 'coord_x', 'coord_y'].

  • time_col (str) – The name of the column that represents the time step or year of the forecast (e.g., ‘coord_t’ or ‘forecast_step’). This column’s values will become part of the new column names.

  • value_prefixes (list of str) – A list of prefixes for the value columns that need to be pivoted. The function identifies columns starting with these prefixes. For instance, ['subsidence', 'GWL'] would match ‘subsidence_q10’, ‘GWL_q50’, etc.

  • static_actuals_cols (list of str, optional) – A list of columns containing static “actual” or ground truth values for each sample. These values are assumed to be constant for each unique sample_idx and are merged back into the wide DataFrame after pivoting. Example: ['subsidence_actual'].

  • time_col_is_float_year (bool or 'auto', default 'auto') –

    Controls how the time_col values are formatted into new column names. - If 'auto', automatically detects if time_col has a

    float dtype.

    • If True, treats time_col values (e.g., 2018.0) as years and converts them to integer strings (‘2018’).

    • If False, uses the string representation of the value as is.

  • round_time_col (bool, default False) – If True and time_col is a float type, its values will be rounded to the nearest integer before being used in column names. This is useful for cleaning up float years (e.g., 2018.0001 -> 2018).

  • verbose (int, default 0) – Controls the verbosity of logging messages. 0 is silent. Higher values print more details about the process.

  • savefile (str, optional) – If a file path is provided, the final wide-format DataFrame will be saved as a CSV file to that location.

  • _logger (Logger | Callable[[str], None] | None)

Returns:

A wide-format DataFrame with one row per unique combination of id_vars. New columns are created in the format {prefix}_{time_str}{_suffix} (e.g., ‘subsidence_2018_q10’).

Return type:

pd.DataFrame

See also

pandas.pivot_table

The core function used for reshaping data.

pandas.merge

Used to re-join static columns after pivoting.

Notes

  • The combination of columns in id_vars and time_col must uniquely identify each row in df_long for the pivot to succeed without data loss.

  • If using static_actuals_cols, the id_vars list must contain ‘sample_idx’ to correctly merge the static data back.

Examples

>>> import pandas as pd
>>> from fusionlab.utils.data_utils import pivot_forecast_dataframe
>>> data = {
...     'sample_idx':      [0, 0, 1, 1],
...     'coord_t':         [2018.0, 2019.0, 2018.0, 2019.0],
...     'coord_x':         [0.1, 0.1, 0.5, 0.5],
...     'coord_y':         [0.2, 0.2, 0.6, 0.6],
...     'subsidence_q50':  [-8, -9, -13, -14],
...     'subsidence_actual': [-8.5, -8.5, -13.2, -13.2],
...     'GWL_q50':         [1.2, 1.3, 2.2, 2.3],
... }
>>> df_long_example = pd.DataFrame(data)
>>> df_wide = pivot_forecast_dataframe(
...     data=df_long_example,
...     id_vars=['sample_idx', 'coord_x', 'coord_y'],
...     time_col='coord_t',
...     value_prefixes=['subsidence', 'GWL'],
...     static_actuals_cols=['subsidence_actual'],
...     verbose=0
... )
>>> print(df_wide.columns)
Index(['sample_idx', 'coord_x', 'coord_y', 'subsidence_actual',
       'GWL_2018_q50', 'GWL_2019_q50', 'subsidence_2018_q50',
       'subsidence_2019_q50'],
      dtype='object')