fusionlab.utils.pivot_forecast_dataframe¶
- fusionlab.utils.pivot_forecast_dataframe(data, id_vars, time_col, value_prefixes, static_actuals_cols=None, time_col_is_float_year='auto', round_time_col=False, verbose=0, savefile=None, _logger=None, **kws)[source]¶
Transforms a long-format forecast DataFrame to a wide format.
This utility reshapes time series prediction data from a “long” format, where each row represents a single time step for a given sample, to a “wide” format, where each row represents a single sample and columns correspond to values at different time steps.
- Parameters:
data (
pd.DataFrame) – The input long-format DataFrame. It must contain the columns specified in id_vars and time_col, as well as value columns that start with the strings in value_prefixes.id_vars (
listofstr) – A list of column names that uniquely identify each sample or group. These columns will be preserved in the wide-format output. For example:['sample_idx', 'coord_x', 'coord_y'].time_col (
str) – The name of the column that represents the time step or year of the forecast (e.g., ‘coord_t’ or ‘forecast_step’). This column’s values will become part of the new column names.value_prefixes (
listofstr) – A list of prefixes for the value columns that need to be pivoted. The function identifies columns starting with these prefixes. For instance,['subsidence', 'GWL']would match ‘subsidence_q10’, ‘GWL_q50’, etc.static_actuals_cols (
listofstr, optional) – A list of columns containing static “actual” or ground truth values for each sample. These values are assumed to be constant for each unique sample_idx and are merged back into the wide DataFrame after pivoting. Example:['subsidence_actual'].time_col_is_float_year (
boolor'auto', default'auto') –Controls how the time_col values are formatted into new column names. - If
'auto', automatically detects if time_col has afloat dtype.
If
True, treats time_col values (e.g., 2018.0) as years and converts them to integer strings (‘2018’).If
False, uses the string representation of the value as is.
round_time_col (
bool, defaultFalse) – IfTrueand time_col is a float type, its values will be rounded to the nearest integer before being used in column names. This is useful for cleaning up float years (e.g., 2018.0001 -> 2018).verbose (
int, default0) – Controls the verbosity of logging messages. 0 is silent. Higher values print more details about the process.savefile (
str, optional) – If a file path is provided, the final wide-format DataFrame will be saved as a CSV file to that location._logger (Logger | Callable[[str], None] | None)
- Returns:
A wide-format DataFrame with one row per unique combination of id_vars. New columns are created in the format {prefix}_{time_str}{_suffix} (e.g., ‘subsidence_2018_q10’).
- Return type:
pd.DataFrame
See also
pandas.pivot_tableThe core function used for reshaping data.
pandas.mergeUsed to re-join static columns after pivoting.
Notes
The combination of columns in id_vars and time_col must uniquely identify each row in df_long for the pivot to succeed without data loss.
If using static_actuals_cols, the id_vars list must contain ‘sample_idx’ to correctly merge the static data back.
Examples
>>> import pandas as pd >>> from fusionlab.utils.data_utils import pivot_forecast_dataframe >>> data = { ... 'sample_idx': [0, 0, 1, 1], ... 'coord_t': [2018.0, 2019.0, 2018.0, 2019.0], ... 'coord_x': [0.1, 0.1, 0.5, 0.5], ... 'coord_y': [0.2, 0.2, 0.6, 0.6], ... 'subsidence_q50': [-8, -9, -13, -14], ... 'subsidence_actual': [-8.5, -8.5, -13.2, -13.2], ... 'GWL_q50': [1.2, 1.3, 2.2, 2.3], ... } >>> df_long_example = pd.DataFrame(data) >>> df_wide = pivot_forecast_dataframe( ... data=df_long_example, ... id_vars=['sample_idx', 'coord_x', 'coord_y'], ... time_col='coord_t', ... value_prefixes=['subsidence', 'GWL'], ... static_actuals_cols=['subsidence_actual'], ... verbose=0 ... ) >>> print(df_wide.columns) Index(['sample_idx', 'coord_x', 'coord_y', 'subsidence_actual', 'GWL_2018_q50', 'GWL_2019_q50', 'subsidence_2018_q50', 'subsidence_2019_q50'], dtype='object')