fusionlab.nn.utils.format_predictions

fusionlab.nn.utils.format_predictions(predictions=None, model=None, inputs=None, y_true_sequences=None, target_name='target', quantiles=None, forecast_horizon=None, output_dim=None, spatial_data_array=None, spatial_cols=None, spatial_cols_indices=None, evaluate_coverage=False, scaler=None, scaler_feature_names=None, target_idx_in_scaler=None, verbose=0, _logger=None, **kwargs)[source]

Formats model predictions into a structured pandas DataFrame.

This utility function takes raw model predictions (either directly as an array/tensor or generated by a provided model and its inputs) and transforms them into a long-format pandas DataFrame. It can handle point forecasts, quantile forecasts, single or multi-output predictions, and optionally include actual target values, spatial identifiers, and perform coverage score evaluation for quantile forecasts.

The output DataFrame is structured with ‘sample_idx’ (identifying the original input sequence) and ‘forecast_step’ (from 1 to H, where H is the forecast horizon).

Parameters:
  • predictions (np.ndarray or tf.Tensor, optional) –

    The raw prediction tensor or array. - For point forecasts, expected shapes:

    • (num_samples, forecast_horizon, output_dim)

    • (num_samples, forecast_horizon) if output_dim=1 (will be reshaped)

    • (num_samples, output_dim) if forecast_horizon=1 (will be reshaped)

    • For quantile forecasts, expected shapes:
      • (num_samples, forecast_horizon, num_quantiles * output_dim)

      • (num_samples, forecast_horizon, num_quantiles, output_dim)

      • (num_samples, num_quantiles * output_dim) if forecast_horizon=1

    If None, model and inputs must be provided. Default is None.

  • model (tf.keras.Model, optional) – A trained Keras model to generate predictions if predictions is not provided. Used in conjunction with inputs. Default is None.

  • inputs (List[Optional[Union[np.ndarray, tf.Tensor]]], optional) – A list of input tensors (e.g., [static, dynamic, future]) required by the model to generate predictions. Required if predictions is None and model is provided. Default is None.

  • y_true_sequences (np.ndarray or tf.Tensor, optional) – The true target values corresponding to the predictions, used for including actuals in the output DataFrame and for evaluation. Expected shape: (num_samples, forecast_horizon, output_dim). Default is None.

  • target_name (str, optional) – Base name for the target variable. Used to prefix prediction and actual column names (e.g., “sales_pred”, “sales_q50”, “sales_actual”). Default is “target”.

  • quantiles (List[float], optional) – A list of quantiles that were predicted by the model (e.g., [0.1, 0.5, 0.9]). Required if the predictions are quantile forecasts. If provided, prediction columns will be named like {target_name}_q10, {target_name}_q50, etc. Default is None (for point forecasts).

  • forecast_horizon (int, optional) – The number of time steps into the future that the model predicts. If not provided, it’s inferred from predictions.shape[1] or y_true_sequences.shape[1]. Default is None.

  • output_dim (int, optional) – The number of target variables predicted at each time step (e.g., 1 for univariate, >1 for multivariate target). If not provided, it’s inferred from the shape of predictions or y_true_sequences. Default is None.

  • spatial_data_array (np.ndarray or tf.Tensor or pd.DataFrame or pd.Series, optional) –

    An array or DataFrame containing static spatial/identifier features for each of the num_samples sequences. - If NumPy/Tensor: Expected shape (num_samples, num_spatial_features).

    spatial_cols_indices must be provided.

    • If DataFrame/Series: Must have num_samples rows. spatial_cols must be provided.

    These features will be repeated for each forecast step in the output DataFrame. Default is None.

  • spatial_cols (List[str], optional) – List of column names to select from spatial_data_array if it’s a DataFrame/Series, or names to assign to columns if spatial_data_array is NumPy/Tensor and spatial_cols_indices are provided. Default is None.

  • spatial_cols_indices (List[int], optional) – List of column indices to select from spatial_data_array if it’s a NumPy/Tensor. Length must match spatial_cols if provided. Default is None.

  • evaluate_coverage (bool, default False) – If True, quantiles are provided (at least two), and y_true_sequences is available, calculates the coverage score using the first and last quantiles as interval bounds. Requires fusionlab.metrics.coverage_score.

  • scaler (Any, optional) – A fitted scikit-learn-like scaler object (must have an inverse_transform method) used to scale the target variable and potentially other features. If provided along with scaler_feature_names and target_idx_in_scaler, predictions and actuals for the target will be inverse-transformed. Default is None.

  • scaler_feature_names (List[str], optional) – A list of all feature names (in order) that the scaler was originally fit on. Required if scaler is provided and targeted inverse transformation is needed. Default is None.

  • target_idx_in_scaler (int, optional) – The index of the target_name within the scaler_feature_names list. Required if scaler is provided and targeted inverse transformation is needed. Default is None.

  • verbose (int, default 0) – Verbosity level for logging during processing. - 0: Silent. - 1: Basic info. - 3: More detailed steps. - 5: Very detailed shape information.

  • **kwargs (Any) – Additional keyword arguments (currently not used but included for future extensibility).

  • _logger (Logger | Callable[[str], None] | None)

  • **kwargs

Returns:

A long-format DataFrame with columns including: - sample_idx: Identifier for the original input sequence. - forecast_step: Time step within the forecast horizon (1 to H). - Spatial columns (if spatial_data_array provided). - Prediction columns:

  • Point forecast: {target_name}_pred (or {target_name}_{output_idx}_pred for multi-output).

  • Quantile forecast: {target_name}_qXX (or {target_name}_{output_idx}_qXX for multi-output).

  • Actual value columns (if y_true_sequences provided):
    • {target_name}_actual (or {target_name}_{output_idx}_actual).

Values in prediction and actual columns will be inverse- transformed if a valid scaler and related parameters are given.

Return type:

pandas.DataFrame

Raises:
  • ValueError – If predictions is None and model or inputs is also None. If predictions shape is invalid (not 2D, 3D, or 4D). If quantiles are provided but prediction shape is incompatible for inferring output_dim. If spatial_data_array is provided without necessary name/index parameters.

  • TypeError – If predictions or other inputs cannot be converted to the expected tensor/array types.

See also

fusionlab.nn.utils.forecast_multi_step

Higher-level forecasting utility.

fusionlab.metrics.coverage_score

For evaluating quantile forecast intervals.

Examples

>>> import tensorflow as tf
>>> import numpy as np
>>> from fusionlab.nn.utils import format_predictions_to_dataframe
>>> B, H, O = 4, 3, 1 # Batch, Horizon, OutputDim
>>> Q = [0.1, 0.5, 0.9]
>>> preds_point = tf.random.normal((B, H, O))
>>> preds_quant = tf.random.normal((B, H, len(Q))) # For O=1
>>> y_true = tf.random.normal((B, H, O))
>>> # Point forecast
>>> df_point = format_predictions_to_dataframe(
...     predictions=preds_point, y_true_sequences=y_true,
...     target_name="value", forecast_horizon=H, output_dim=O
... )
>>> print(df_point.head(H)) # Show first sample's horizon
   sample_idx  forecast_step  value_pred  value_actual
0           0              1   -0.576731     -0.647362
1           0              2    0.183931      1.198977
2           0              3   -0.766871      0.534040
>>> # Quantile forecast
>>> df_quant = format_predictions_to_dataframe(
...     predictions=preds_quant, y_true_sequences=y_true,
...     target_name="value", quantiles=Q,
...     forecast_horizon=H, output_dim=O
... )
>>> print(df_quant.head(H))
   sample_idx  forecast_step  value_q10  value_q50  value_q90  value_actual
0           0              1  -0.209947   0.263107  -0.308929     -0.647362
1           0              2   0.303091   0.594701  -0.225007      1.198977
2           0              3   0.136699  -1.237739   0.002834      0.534040
>>> # With spatial data (NumPy array)
>>> spatial_np = np.array([[101, 201], [102, 202], [103, 203], [104, 204]])
>>> df_spatial = format_predictions_to_dataframe(
...     predictions=preds_point,
...     spatial_data_array=spatial_np,
...     spatial_cols=['store_id', 'region_id'],
...     spatial_cols_indices=[0, 1]
... )
>>> print(df_spatial[['sample_idx', 'forecast_step', 'store_id']].head(H))
   sample_idx  forecast_step  store_id
0           0              1     101.0
1           0              2     101.0
2           0              3     101.0