fusionlab.metrics.weighted_interval_score

fusionlab.metrics.weighted_interval_score(y_true, y_lower, y_upper, y_median, alphas, sample_weight=None, nan_policy='propagate', multioutput='uniform_average', warn_invalid_bounds=True, verbose=0)[source]

Compute the Weighted Interval Score (WIS).

The WIS is a proper scoring rule that evaluates probabilistic forecasts given as a set of central prediction intervals and a median forecast [1]. It generalizes the absolute error and considers multiple quantile levels.

The score for a single interval at level \(\alpha_k\) is: .. math:

\mathrm{IS}_{\alpha_k}(y, l_k, u_k) = (u_k - l_k)
+ \frac{2}{\alpha_k}(l_k - y)\mathbf{1}\{y < l_k\}
+ \frac{2}{\alpha_k}(y - u_k)\mathbf{1}\{y > u_k\}

The WIS is then defined as a weighted average of the absolute error of the median forecast and the interval scores for K central prediction intervals: .. math:

\mathrm{WIS}(y, m, \{(l_k, u_k, \alpha_k)\}_{k=1}^K) =
\frac{1}{K + 0.5} \left( \frac{1}{2}|y - m| +
\sum_{k=1}^K \frac{\alpha_k}{2} \mathrm{IS}_{\alpha_k} \right)

Alternatively, a common formulation used (and implemented here, following the reference’s R script and common implementations like scoringutils R package) is: For each interval \(k\) with level \(\alpha_k\), its contribution to the score for a single observation \(y\) is: .. math:

S_k = (u_k - l_k) + \frac{2}{\alpha_k}(l_k - y)\mathbf{1}\{y < l_k\}
    + \frac{2}{\alpha_k}(y - u_k)\mathbf{1}\{y > u_k\}

The total score for observation y is: .. math:

\mathrm{Score}_y = \frac{1}{K+1} \left( |y-m| + \sum_{k=1}^K \frac{\alpha_k}{2} S_k \right)

This can be simplified by directly using the per-interval WIS contribution: .. math:

\mathrm{WIS}_{\alpha_k}(y, l_k, u_k) = \frac{\alpha_k}{2}(u_k - l_k)
+ (l_k - y)\mathbf{1}\{y < l_k\}
+ (y - u_k)\mathbf{1}\{y > u_k\}

Then the aggregated WIS is: .. math:

\mathrm{WIS} = \frac{1}{K + 1} \left(|y - m| +
\sum_{k=1}^K \mathrm{WIS}_{\alpha_k}\right)

This is the version implemented.

Parameters:
  • y_true (array-like) – Observed true values. Shape: (n_samples,) or (n_samples, n_outputs).

  • y_lower (array-like) – Lower bounds for each central prediction interval. - If y_true is 1D: (n_samples, K_intervals) - If y_true is 2D: (n_samples, n_outputs, K_intervals)

  • y_upper (array-like) – Upper bounds, matching y_lower’s shape.

  • y_median (array-like) – Median forecasts. Shape: (n_samples,) or (n_samples, n_outputs).

  • alphas (array-like of float, shape (K_intervals,)) – Nominal central interval probability levels (e.g., 0.1 for 10% PI, meaning quantiles are 0.05 and 0.95). Each alpha must be in (0, 1). These are used as weights.

  • sample_weight (array-like of shape (n_samples,), optional) – Sample weights. If None, samples are equally weighted.

  • nan_policy ({'omit', 'propagate', 'raise'}, default 'propagate') – How to handle NaNs in inputs.

  • multioutput ({'raw_values', 'uniform_average'}, default 'uniform_average') – Defines aggregation for multi-output y_true.

  • warn_invalid_bounds (bool, default True) – If True, issues a UserWarning if any y_lower > y_upper.

  • verbose (int, default 0) – Verbosity level: 0 (silent), 1 (summary), >=2 (debug details).

Returns:

score – Average WIS. Lower values are better. Scalar if multioutput=’uniform_average’ or y_true is 1D. Array of shape (n_outputs,) if multioutput=’raw_values’ and y_true is 2D.

Return type:

float or ndarray of floats

Examples

>>> import numpy as np
>>> # Single-output example
>>> y_t = np.array([10, 12, 11, np.nan])
>>> y_l = np.array([[9, 8], [11, 10], [10, 9], [9,8]]) # K=2 intervals
>>> y_u = np.array([[11, 12], [13, 14], [12, 13], [11,12]])
>>> y_m = np.array([10, 12, 11, 10])
>>> a = np.array([0.2, 0.5]) # alpha for 20% and 50% PIs
>>> wis = weighted_interval_score(y_t, y_l, y_u, y_m, a,
...                               nan_policy='omit', verbose=1)
WIS computed: 0.4333
>>> print(f"WIS (1D, omit): {wis:.4f}")
WIS (1D, omit): 0.4333
>>> # Multi-output example
>>> y_t_mo = np.array([[10, 20], [12, np.nan]]) # (2_samples, 2_outputs)
>>> y_l_mo = np.array([ # (2_samples, 2_outputs, 1_interval)
...     [[9], [19]],   # Sample 0, Output 0&1, Interval 0
...     [[11], [21]]   # Sample 1, Output 0&1, Interval 0
... ])
>>> y_u_mo = np.array([
...     [[11], [21]],
...     [[13], [23]]
... ])
>>> y_m_mo = np.array([[10, 20], [12, 22]])
>>> a_mo = np.array([0.5]) # K=1 interval
>>> wis_mo_raw = weighted_interval_score(
...     y_t_mo, y_l_mo, y_u_mo, y_m_mo, a_mo,
...     nan_policy='propagate', multioutput='raw_values', verbose=1
... )
WIS computed: [0.25 nan ]
>>> print(f"WIS (2D, raw, propagate): {wis_mo_raw}")
WIS (2D, raw, propagate): [0.25 nan]

Notes

  • WIS is a proper scoring rule for evaluating quantile/interval forecasts.

  • It balances sharpness (interval width) and calibration.

  • Lower WIS values indicate better forecast performance.

References