fusionlab.nn.utils.format_pihalnet_predictions

fusionlab.nn.utils.format_pihalnet_predictions(pihalnet_outputs=None, model=None, model_inputs=None, y_true_dict=None, target_mapping=None, include_gwl=True, include_coords=True, quantiles=None, forecast_horizon=None, output_dims=None, ids_data_array=None, ids_cols=None, ids_cols_indices=None, scaler_info=None, coord_scaler=None, evaluate_coverage=False, coverage_quantile_indices=(0, -1), savefile=None, name=None, model_name=None, apply_mask=False, mask_values=None, mask_fill_value=None, verbose=0, _logger=None, stop_check=None, **kwargs)[source]

Formats PIHALNet/GeoPriorSubsNet predictions into a structured pandas DataFrame, handling inversion, quantiles, and coordinates.

This function is the core formatter. It: 1. Gets model outputs (or uses provided ones). 2. Unpacks ‘data_final’ if model_name is ‘geoprior’. 3. Inverse-transforms all prediction and actual arrays using scaler_info. 4. Builds a long-format DataFrame with sample_idx and forecast_step. 5. Appends inverted quantile/point predictions. 6. Appends inverted actual values. 7. Appends inverted coordinates. 8. Appends static/ID columns. 9. Evaluates coverage on the inverted data.

Parameters:
  • pihalnet_outputs (dict, optional) – Raw output from model.predict(). If None, model and model_inputs must be provided.

  • model (tf.keras.Model, optional) – Trained model instance (if pihalnet_outputs is None).

  • model_inputs (dict, optional) – Inputs for the model to generate predictions (if pihalnet_outputs is None).

  • y_true_dict (dict, optional) – Dictionary of true target arrays (e.g., {‘subs_pred’: y_true_s}). Required for including actuals and evaluating coverage.

  • target_mapping (dict, optional) – Maps prediction keys to base names for DataFrame columns. Default: {‘subs_pred’: ‘subsidence’, ‘gwl_pred’: ‘gwl’}.

  • include_gwl (bool, default True) – Whether to include ‘gwl_pred’ in the final DataFrame.

  • include_coords (bool, default True) – Whether to include ‘coord_t’, ‘coord_x’, ‘coord_y’ columns.

  • quantiles (list[float], optional) – List of quantiles (e.g., [0.1, 0.5, 0.9]). If provided, quantile columns (e.g., ‘subsidence_q10’) are created.

  • forecast_horizon (int, optional) – The forecast horizon length (H). If not provided, it’s inferred from the prediction array’s shape.

  • output_dims (dict, optional) – Maps prediction keys to their output dimension (O). E.g., {‘subs_pred’: 1, ‘gwl_pred’: 1}. Crucial for correctly splitting GeoPrior outputs and reshaping.

  • ids_data_array (np.ndarray or pd.DataFrame, optional) – Static/ID data (e.g., original coordinates) to merge. Must have the same number of samples (B) as predictions.

  • ids_cols (list[str], optional) – Column names if ids_data_array is a DataFrame.

  • ids_cols_indices (list[int], optional) – Column indices if ids_data_array is a NumPy array.

  • scaler_info (dict, optional) –

    Dictionary for inverse scaling, structured as: { ‘subsidence’: {‘scaler’: scaler_obj, ‘idx’: 0, ‘all_features’: […]},

    ’gwl’: {‘scaler’: scaler_obj, ‘idx’: 1, ‘all_features’: […]} }

  • coord_scaler (sklearn.preprocessing.Scaler, optional) – A fitted scaler object for inverse transforming the ‘coords’ tensor.

  • evaluate_coverage (bool, default False) – If True, calculates coverage percentage for quantiles.

  • coverage_quantile_indices (tuple[int, int], default (0, -1)) – Indices of the low and high quantiles in the quantiles list to use for coverage (e.g., 0 and -1 for 10th and 90th).

  • savefile (str, optional) – If provided, saves the final DataFrame to this path.

  • model_name (str, optional) – Specifies the model type. If ‘geoprior’ or ‘geopriorsubsnet’, triggers unpacking of the ‘data_final’ output.

  • apply_mask (bool, default False) – If True, masks predictions based on mask_values in the first target’s _actual column.

  • mask_values (float or int, optional) – The value in the _actual column to trigger masking.

  • mask_fill_value (float, optional) – The value to replace masked predictions with (e.g., np.nan).

  • verbose (int, default 0) – Logging verbosity.

  • _logger (logging.Logger or callable, optional) – Logger object.

  • stop_check (callable, optional) – Function to check for early stopping.

  • name (str | None)

Returns:

A long-format DataFrame with predictions, actuals, and coordinates.

Return type:

pd.DataFrame