fusionlab.metrics.quantile_calibration_error¶
- fusionlab.metrics.quantile_calibration_error(y_true, y_pred, quantiles, sample_weight=None, nan_policy='propagate', multioutput='uniform_average', eps=1e-08, verbose=0)[source]¶
Compute Quantile Calibration Error (QCE).
Assesses the calibration of probabilistic forecasts by comparing the empirical frequency of observations falling below a predicted quantile to the nominal quantile level.
For a single output and quantile level \(q\), the QCE is: .. math:
\mathrm{QCE}(q) = \left| \frac{1}{N_{valid}} \sum_{i=1}^{N_{valid}} \mathbf{1}\{y_i \le \hat Q_i(q)\} - q \right|,
where \(\hat Q_i(q)\) is the predicted q-th quantile for sample \(i\), \(y_i\) is the observed value, and \(N_{valid}\) is the number of valid samples after NaN handling. The function returns the average QCE across all provided quantile levels (and potentially outputs).
- Parameters:
y_true (
array-like) – Observed true values. Expected shapes: - (n_samples,) for single output. - (n_samples, n_outputs) for multi-output.y_pred (
array-like) – Predicted quantiles. Expected shapes: - If y_true is 1D: (n_samples, n_quantiles) - If y_true is 2D: (n_samples, n_outputs, n_quantiles)quantiles (
array-likeofshape (n_quantiles,)) – Nominal quantile levels (e.g., [0.1, 0.5, 0.9]). Each value must be strictly between 0 and 1.sample_weight (
array-likeofshape (n_samples,), optional) – Sample weights. If None, samples are equally weighted when calculating empirical frequencies. Sum of weights must be > eps.nan_policy (
{'omit', 'propagate', 'raise'}, default'propagate') –- How to handle NaNs in y_true or y_pred:
'raise': Raise an error on any NaN.'omit': Drop samples (rows) containing NaNs.'propagate': The QCE for samples/outputs/quantiles affected by NaNs will be NaN.
multioutput (
{'raw_values', 'uniform_average'}, default'uniform_average') –- Defines aggregation if y_true and y_pred are multi-output.
'raw_values': Returns an array of QCE scores, one for each output (averaged over quantiles).'uniform_average': Scores of all outputs are averaged.
eps (
float, default1e-8) – Small epsilon value to prevent division by zero when sum of weights is very close to or is zero.verbose (
int, default0) – Verbosity level: 0 (silent), 1 (summary), >=2 (debug details, including per-quantile QCE).
- Returns:
score – Mean QCE. Scalar if multioutput=’uniform_average’ or if inputs represent a single output. Array of shape (n_outputs,) if multioutput=’raw_values’ and inputs are multi-output. Lower values (closer to 0) indicate better calibration.
- Return type:
floatorndarrayoffloats
Examples
>>> import numpy as np >>> # from fusionlab.metrics import quantile_calibration_error
>>> y_t = np.array([1, 2, 3, 4, 5]) >>> q_levels = np.array([0.1, 0.5, 0.9]) >>> y_p = np.array([ # (5 samples, 3 quantiles) ... [0.5, 1.0, 1.5], # y_true=1 ... [1.0, 2.0, 3.0], # y_true=2 ... [2.5, 3.0, 3.5], # y_true=3 ... [3.0, 4.0, 5.0], # y_true=4 ... [4.5, 5.0, 5.5] # y_true=5 ... ]) >>> qce = quantile_calibration_error(y_t, y_p, q_levels, verbose=0) >>> print(f"QCE: {qce:.4f}") QCE: 0.0667
>>> # Multi-output example >>> y_t_mo = np.array([[1,10],[2,20],[3,30]]) # (3s, 2o) >>> y_p_mo = np.array([ # (3s, 2o, 2q) ... [[0.5,1.5], [9,11]], # s0, (o0,o1), (q0,q1) ... [[1.5,2.5], [19,21]], # s1 ... [[2.5,3.5], [29,31]] # s2 ... ]) >>> q_mo = np.array([0.25, 0.75]) >>> qce_mo_raw = quantile_calibration_error( ... y_t_mo, y_p_mo, q_mo, multioutput='raw_values', verbose=0 ... ) >>> print(f"QCE (multi-output, raw): {qce_mo_raw}") QCE (multi-output, raw): [0.08333333 0.08333333]
See also
coverage_scoreMetric for prediction interval coverage.
pinball_lossLoss function for quantile regression.
References