fusionlab.nn.utils.format_predictions¶
- fusionlab.nn.utils.format_predictions(predictions=None, model=None, inputs=None, y_true_sequences=None, target_name='target', quantiles=None, forecast_horizon=None, output_dim=None, spatial_data_array=None, spatial_cols=None, spatial_cols_indices=None, evaluate_coverage=False, scaler=None, scaler_feature_names=None, target_idx_in_scaler=None, verbose=0, _logger=None, **kwargs)[source]¶
Formats model predictions into a structured pandas DataFrame.
This utility function takes raw model predictions (either directly as an array/tensor or generated by a provided model and its inputs) and transforms them into a long-format pandas DataFrame. It can handle point forecasts, quantile forecasts, single or multi-output predictions, and optionally include actual target values, spatial identifiers, and perform coverage score evaluation for quantile forecasts.
The output DataFrame is structured with ‘sample_idx’ (identifying the original input sequence) and ‘forecast_step’ (from 1 to H, where H is the forecast horizon).
- Parameters:
predictions (
np.ndarrayortf.Tensor, optional) –The raw prediction tensor or array. - For point forecasts, expected shapes:
(num_samples, forecast_horizon, output_dim)
(num_samples, forecast_horizon) if output_dim=1 (will be reshaped)
(num_samples, output_dim) if forecast_horizon=1 (will be reshaped)
- For quantile forecasts, expected shapes:
(num_samples, forecast_horizon, num_quantiles * output_dim)
(num_samples, forecast_horizon, num_quantiles, output_dim)
(num_samples, num_quantiles * output_dim) if forecast_horizon=1
If
None, model and inputs must be provided. Default isNone.model (
tf.keras.Model, optional) – A trained Keras model to generate predictions if predictions is not provided. Used in conjunction with inputs. Default isNone.inputs (
List[Optional[Union[np.ndarray,tf.Tensor]]], optional) – A list of input tensors (e.g., [static, dynamic, future]) required by the model to generate predictions. Required if predictions isNoneand model is provided. Default isNone.y_true_sequences (
np.ndarrayortf.Tensor, optional) – The true target values corresponding to the predictions, used for including actuals in the output DataFrame and for evaluation. Expected shape: (num_samples, forecast_horizon, output_dim). Default isNone.target_name (
str, optional) – Base name for the target variable. Used to prefix prediction and actual column names (e.g., “sales_pred”, “sales_q50”, “sales_actual”). Default is “target”.quantiles (
List[float], optional) – A list of quantiles that were predicted by the model (e.g., [0.1, 0.5, 0.9]). Required if the predictions are quantile forecasts. If provided, prediction columns will be named like {target_name}_q10, {target_name}_q50, etc. Default isNone(for point forecasts).forecast_horizon (
int, optional) – The number of time steps into the future that the model predicts. If not provided, it’s inferred from predictions.shape[1] or y_true_sequences.shape[1]. Default isNone.output_dim (
int, optional) – The number of target variables predicted at each time step (e.g., 1 for univariate, >1 for multivariate target). If not provided, it’s inferred from the shape of predictions or y_true_sequences. Default isNone.spatial_data_array (
np.ndarrayortf.Tensororpd.DataFrameorpd.Series, optional) –An array or DataFrame containing static spatial/identifier features for each of the num_samples sequences. - If NumPy/Tensor: Expected shape (num_samples, num_spatial_features).
spatial_cols_indices must be provided.
If DataFrame/Series: Must have num_samples rows. spatial_cols must be provided.
These features will be repeated for each forecast step in the output DataFrame. Default is
None.spatial_cols (
List[str], optional) – List of column names to select from spatial_data_array if it’s a DataFrame/Series, or names to assign to columns if spatial_data_array is NumPy/Tensor and spatial_cols_indices are provided. Default isNone.spatial_cols_indices (
List[int], optional) – List of column indices to select from spatial_data_array if it’s a NumPy/Tensor. Length must match spatial_cols if provided. Default isNone.evaluate_coverage (
bool, defaultFalse) – IfTrue, quantiles are provided (at least two), and y_true_sequences is available, calculates the coverage score using the first and last quantiles as interval bounds. Requires fusionlab.metrics.coverage_score.scaler (
Any, optional) – A fitted scikit-learn-like scaler object (must have an inverse_transform method) used to scale the target variable and potentially other features. If provided along with scaler_feature_names and target_idx_in_scaler, predictions and actuals for the target will be inverse-transformed. Default isNone.scaler_feature_names (
List[str], optional) – A list of all feature names (in order) that the scaler was originally fit on. Required if scaler is provided and targeted inverse transformation is needed. Default isNone.target_idx_in_scaler (
int, optional) – The index of the target_name within the scaler_feature_names list. Required if scaler is provided and targeted inverse transformation is needed. Default isNone.verbose (
int, default0) – Verbosity level for logging during processing. -0: Silent. -1: Basic info. -3: More detailed steps. -5: Very detailed shape information.**kwargs (Any) – Additional keyword arguments (currently not used but included for future extensibility).
_logger (Logger | Callable[[str], None] | None)
**kwargs
- Returns:
A long-format DataFrame with columns including: - sample_idx: Identifier for the original input sequence. - forecast_step: Time step within the forecast horizon (1 to H). - Spatial columns (if spatial_data_array provided). - Prediction columns:
Point forecast: {target_name}_pred (or {target_name}_{output_idx}_pred for multi-output).
Quantile forecast: {target_name}_qXX (or {target_name}_{output_idx}_qXX for multi-output).
- Actual value columns (if y_true_sequences provided):
{target_name}_actual (or {target_name}_{output_idx}_actual).
Values in prediction and actual columns will be inverse- transformed if a valid scaler and related parameters are given.
- Return type:
pandas.DataFrame- Raises:
ValueError – If predictions is None and model or inputs is also None. If predictions shape is invalid (not 2D, 3D, or 4D). If quantiles are provided but prediction shape is incompatible for inferring output_dim. If spatial_data_array is provided without necessary name/index parameters.
TypeError – If predictions or other inputs cannot be converted to the expected tensor/array types.
See also
fusionlab.nn.utils.forecast_multi_stepHigher-level forecasting utility.
fusionlab.metrics.coverage_scoreFor evaluating quantile forecast intervals.
Examples
>>> import tensorflow as tf >>> import numpy as np >>> from fusionlab.nn.utils import format_predictions_to_dataframe
>>> B, H, O = 4, 3, 1 # Batch, Horizon, OutputDim >>> Q = [0.1, 0.5, 0.9] >>> preds_point = tf.random.normal((B, H, O)) >>> preds_quant = tf.random.normal((B, H, len(Q))) # For O=1 >>> y_true = tf.random.normal((B, H, O))
>>> # Point forecast >>> df_point = format_predictions_to_dataframe( ... predictions=preds_point, y_true_sequences=y_true, ... target_name="value", forecast_horizon=H, output_dim=O ... ) >>> print(df_point.head(H)) # Show first sample's horizon sample_idx forecast_step value_pred value_actual 0 0 1 -0.576731 -0.647362 1 0 2 0.183931 1.198977 2 0 3 -0.766871 0.534040
>>> # Quantile forecast >>> df_quant = format_predictions_to_dataframe( ... predictions=preds_quant, y_true_sequences=y_true, ... target_name="value", quantiles=Q, ... forecast_horizon=H, output_dim=O ... ) >>> print(df_quant.head(H)) sample_idx forecast_step value_q10 value_q50 value_q90 value_actual 0 0 1 -0.209947 0.263107 -0.308929 -0.647362 1 0 2 0.303091 0.594701 -0.225007 1.198977 2 0 3 0.136699 -1.237739 0.002834 0.534040
>>> # With spatial data (NumPy array) >>> spatial_np = np.array([[101, 201], [102, 202], [103, 203], [104, 204]]) >>> df_spatial = format_predictions_to_dataframe( ... predictions=preds_point, ... spatial_data_array=spatial_np, ... spatial_cols=['store_id', 'region_id'], ... spatial_cols_indices=[0, 1] ... ) >>> print(df_spatial[['sample_idx', 'forecast_step', 'store_id']].head(H)) sample_idx forecast_step store_id 0 0 1 101.0 1 0 2 101.0 2 0 3 101.0