fusionlab.metrics.weighted_interval_score¶
- fusionlab.metrics.weighted_interval_score(y_true, y_lower, y_upper, y_median, alphas, sample_weight=None, nan_policy='propagate', multioutput='uniform_average', warn_invalid_bounds=True, verbose=0)[source]¶
Compute the Weighted Interval Score (WIS).
The WIS is a proper scoring rule that evaluates probabilistic forecasts given as a set of central prediction intervals and a median forecast [1]. It generalizes the absolute error and considers multiple quantile levels.
The score for a single interval at level \(\alpha_k\) is: .. math:
\mathrm{IS}_{\alpha_k}(y, l_k, u_k) = (u_k - l_k) + \frac{2}{\alpha_k}(l_k - y)\mathbf{1}\{y < l_k\} + \frac{2}{\alpha_k}(y - u_k)\mathbf{1}\{y > u_k\}
The WIS is then defined as a weighted average of the absolute error of the median forecast and the interval scores for K central prediction intervals: .. math:
\mathrm{WIS}(y, m, \{(l_k, u_k, \alpha_k)\}_{k=1}^K) = \frac{1}{K + 0.5} \left( \frac{1}{2}|y - m| + \sum_{k=1}^K \frac{\alpha_k}{2} \mathrm{IS}_{\alpha_k} \right)
Alternatively, a common formulation used (and implemented here, following the reference’s R script and common implementations like scoringutils R package) is: For each interval \(k\) with level \(\alpha_k\), its contribution to the score for a single observation \(y\) is: .. math:
S_k = (u_k - l_k) + \frac{2}{\alpha_k}(l_k - y)\mathbf{1}\{y < l_k\} + \frac{2}{\alpha_k}(y - u_k)\mathbf{1}\{y > u_k\}
The total score for observation y is: .. math:
\mathrm{Score}_y = \frac{1}{K+1} \left( |y-m| + \sum_{k=1}^K \frac{\alpha_k}{2} S_k \right)
This can be simplified by directly using the per-interval WIS contribution: .. math:
\mathrm{WIS}_{\alpha_k}(y, l_k, u_k) = \frac{\alpha_k}{2}(u_k - l_k) + (l_k - y)\mathbf{1}\{y < l_k\} + (y - u_k)\mathbf{1}\{y > u_k\}
Then the aggregated WIS is: .. math:
\mathrm{WIS} = \frac{1}{K + 1} \left(|y - m| + \sum_{k=1}^K \mathrm{WIS}_{\alpha_k}\right)
This is the version implemented.
- Parameters:
y_true (
array-like) – Observed true values. Shape: (n_samples,) or (n_samples, n_outputs).y_lower (
array-like) – Lower bounds for each central prediction interval. - If y_true is 1D: (n_samples, K_intervals) - If y_true is 2D: (n_samples, n_outputs, K_intervals)y_upper (
array-like) – Upper bounds, matching y_lower’s shape.y_median (
array-like) – Median forecasts. Shape: (n_samples,) or (n_samples, n_outputs).alphas (
array-likeoffloat,shape (K_intervals,)) – Nominal central interval probability levels (e.g., 0.1 for 10% PI, meaning quantiles are 0.05 and 0.95). Each alpha must be in (0, 1). These are used as weights.sample_weight (
array-likeofshape (n_samples,), optional) – Sample weights. If None, samples are equally weighted.nan_policy (
{'omit', 'propagate', 'raise'}, default'propagate') – How to handle NaNs in inputs.multioutput (
{'raw_values', 'uniform_average'}, default'uniform_average') – Defines aggregation for multi-output y_true.warn_invalid_bounds (
bool, defaultTrue) – If True, issues a UserWarning if any y_lower > y_upper.verbose (
int, default0) – Verbosity level: 0 (silent), 1 (summary), >=2 (debug details).
- Returns:
score – Average WIS. Lower values are better. Scalar if multioutput=’uniform_average’ or y_true is 1D. Array of shape (n_outputs,) if multioutput=’raw_values’ and y_true is 2D.
- Return type:
floatorndarrayoffloats
Examples
>>> import numpy as np >>> # Single-output example >>> y_t = np.array([10, 12, 11, np.nan]) >>> y_l = np.array([[9, 8], [11, 10], [10, 9], [9,8]]) # K=2 intervals >>> y_u = np.array([[11, 12], [13, 14], [12, 13], [11,12]]) >>> y_m = np.array([10, 12, 11, 10]) >>> a = np.array([0.2, 0.5]) # alpha for 20% and 50% PIs >>> wis = weighted_interval_score(y_t, y_l, y_u, y_m, a, ... nan_policy='omit', verbose=1) WIS computed: 0.4333 >>> print(f"WIS (1D, omit): {wis:.4f}") WIS (1D, omit): 0.4333
>>> # Multi-output example >>> y_t_mo = np.array([[10, 20], [12, np.nan]]) # (2_samples, 2_outputs) >>> y_l_mo = np.array([ # (2_samples, 2_outputs, 1_interval) ... [[9], [19]], # Sample 0, Output 0&1, Interval 0 ... [[11], [21]] # Sample 1, Output 0&1, Interval 0 ... ]) >>> y_u_mo = np.array([ ... [[11], [21]], ... [[13], [23]] ... ]) >>> y_m_mo = np.array([[10, 20], [12, 22]]) >>> a_mo = np.array([0.5]) # K=1 interval >>> wis_mo_raw = weighted_interval_score( ... y_t_mo, y_l_mo, y_u_mo, y_m_mo, a_mo, ... nan_policy='propagate', multioutput='raw_values', verbose=1 ... ) WIS computed: [0.25 nan ] >>> print(f"WIS (2D, raw, propagate): {wis_mo_raw}") WIS (2D, raw, propagate): [0.25 nan]
Notes
WIS is a proper scoring rule for evaluating quantile/interval forecasts.
It balances sharpness (interval width) and calibration.
Lower WIS values indicate better forecast performance.
References