fusionlab.metrics.quantile_calibration_error¶

fusionlab.metrics.quantile_calibration_error(y_true, y_pred, quantiles, sample_weight=None, nan_policy='propagate', multioutput='uniform_average', eps=1e-08, verbose=0)[source]¶

Compute Quantile Calibration Error (QCE).

Assesses the calibration of probabilistic forecasts by comparing the empirical frequency of observations falling below a predicted quantile to the nominal quantile level.

For a single output and quantile level \(q\), the QCE is: .. math:

\mathrm{QCE}(q) = \left| \frac{1}{N_{valid}} \sum_{i=1}^{N_{valid}}
\mathbf{1}\{y_i \le \hat Q_i(q)\} - q \right|,

where \(\hat Q_i(q)\) is the predicted q-th quantile for sample \(i\), \(y_i\) is the observed value, and \(N_{valid}\) is the number of valid samples after NaN handling. The function returns the average QCE across all provided quantile levels (and potentially outputs).

Parameters:

y_true (array-like) – Observed true values. Expected shapes: - (n_samples,) for single output. - (n_samples, n_outputs) for multi-output.
y_pred (array-like) – Predicted quantiles. Expected shapes: - If y_true is 1D: (n_samples, n_quantiles) - If y_true is 2D: (n_samples, n_outputs, n_quantiles)
quantiles (array-like of shape (n_quantiles,)) – Nominal quantile levels (e.g., [0.1, 0.5, 0.9]). Each value must be strictly between 0 and 1.
sample_weight (array-like of shape (n_samples,), optional) – Sample weights. If None, samples are equally weighted when calculating empirical frequencies. Sum of weights must be > eps.
nan_policy ({'omit', 'propagate', 'raise'}, default 'propagate') –
How to handle NaNs in y_true or y_pred:
- 'raise': Raise an error on any NaN.
- 'omit': Drop samples (rows) containing NaNs.
- 'propagate': The QCE for samples/outputs/quantiles affected by NaNs will be NaN.
multioutput ({'raw_values', 'uniform_average'}, default 'uniform_average') –
Defines aggregation if y_true and y_pred are multi-output.
- 'raw_values': Returns an array of QCE scores, one for each output (averaged over quantiles).
- 'uniform_average': Scores of all outputs are averaged.
eps (float, default 1e-8) – Small epsilon value to prevent division by zero when sum of weights is very close to or is zero.
verbose (int, default 0) – Verbosity level: 0 (silent), 1 (summary), >=2 (debug details, including per-quantile QCE).

Returns:

score – Mean QCE. Scalar if multioutput=’uniform_average’ or if inputs represent a single output. Array of shape (n_outputs,) if multioutput=’raw_values’ and inputs are multi-output. Lower values (closer to 0) indicate better calibration.

Return type:

float or ndarray of floats

Examples

>>> import numpy as np
>>> # from fusionlab.metrics import quantile_calibration_error

>>> y_t = np.array([1, 2, 3, 4, 5])
>>> q_levels = np.array([0.1, 0.5, 0.9])
>>> y_p = np.array([ # (5 samples, 3 quantiles)
...     [0.5, 1.0, 1.5], # y_true=1
...     [1.0, 2.0, 3.0], # y_true=2
...     [2.5, 3.0, 3.5], # y_true=3
...     [3.0, 4.0, 5.0], # y_true=4
...     [4.5, 5.0, 5.5]  # y_true=5
... ])
>>> qce = quantile_calibration_error(y_t, y_p, q_levels, verbose=0)
>>> print(f"QCE: {qce:.4f}")
QCE: 0.0667

>>> # Multi-output example
>>> y_t_mo = np.array([[1,10],[2,20],[3,30]]) # (3s, 2o)
>>> y_p_mo = np.array([ # (3s, 2o, 2q)
...   [[0.5,1.5], [9,11]], # s0, (o0,o1), (q0,q1)
...   [[1.5,2.5], [19,21]], # s1
...   [[2.5,3.5], [29,31]]  # s2
... ])
>>> q_mo = np.array([0.25, 0.75])
>>> qce_mo_raw = quantile_calibration_error(
...     y_t_mo, y_p_mo, q_mo, multioutput='raw_values', verbose=0
... )
>>> print(f"QCE (multi-output, raw): {qce_mo_raw}")
QCE (multi-output, raw): [0.08333333 0.08333333]