fusionlab.metrics.mean_interval_width_score¶

fusionlab.metrics.mean_interval_width_score(y_lower, y_upper, sample_weight=None, nan_policy='propagate', multioutput='uniform_average', warn_invalid_bounds=True, eps=1e-08, verbose=0)[source]¶

Compute the Mean Interval Width (sharpness) of prediction intervals.

This metric measures the average width of the provided prediction intervals, independent of whether they cover the true values. Lower values indicate narrower, sharper intervals.

\[\mathrm{MeanIntervalWidth} = \frac{1}{N_{valid}} \sum_{i=1}^{N_{valid}} (u_i - l_i),\]

where \(l_i\) and \(u_i\) are the lower and upper bounds for sample \(i\), and \(N_{valid}\) is the number of valid samples after NaN handling. If sample_weight is used, it becomes a weighted average.

Parameters:

y_lower (array-like) – Lower bound predictions. Expected shapes: - (n_samples,) for single output. - (n_samples, n_outputs) for multi-output.
y_upper (array-like) – Upper bound predictions, matching y_lower in shape.
sample_weight (array-like of shape (n_samples,), optional) – Sample weights. If None, samples are equally weighted. Sum of weights must be > eps.
nan_policy ({'omit', 'propagate', 'raise'}, default 'propagate') –
How to handle NaNs in y_lower or y_upper:
- 'raise': Raise an error on any NaN.
- 'omit': Drop samples (rows) containing NaNs in either y_lower or y_upper.
- 'propagate': The width for samples/outputs with NaNs will be NaN, which may affect the final mean.
multioutput ({'raw_values', 'uniform_average'}, default 'uniform_average') –
Defines aggregation if inputs are multi-output (2D).
- 'raw_values': Returns an array of mean widths, one for each output.
- 'uniform_average': Mean widths of all outputs are averaged with uniform weight.
warn_invalid_bounds (bool, default True) – If True, issues a UserWarning if any y_lower[i] > y_upper[i]. The width for such intervals will be negative.
eps (float, default 1e-8) – Small epsilon value to prevent division by zero when sum of sample weights is very close to or is zero.
verbose (int, default 0) – Verbosity level: 0 (silent), 1 (summary), >=2 (debug details).

Returns:

score – The mean interval width. Scalar if multioutput=’uniform_average’ or if inputs are 1D. Array of shape (n_outputs,) if multioutput=’raw_values’ and inputs are 2D.

Return type:

float or ndarray of floats

Notes

This metric is also known as “sharpness.”
It is often reported alongside coverage_score to provide a more complete picture of prediction interval performance.
This metric does not consider calibration (i.e., whether the true values fall within the intervals).

Examples

>>> import numpy as np
>>> # from fusionlab.metrics import mean_interval_width_score

>>> y_l = np.array([9, 11, 10, np.nan])
>>> y_u = np.array([11, 13, 12, 10])
>>> # Widths: [2, 2, 2, nan]
>>> score_prop = mean_interval_width_score(y_l, y_u, nan_policy='propagate')
>>> print(f"MIW (propagate): {score_prop:.4f}")
MIW (propagate): nan

>>> score_omit = mean_interval_width_score(y_l, y_u, nan_policy='omit')
>>> # Valid widths: [2, 2, 2]. Mean = 2.0
>>> print(f"MIW (omit): {score_omit:.4f}")
MIW (omit): 2.0000

>>> # Multi-output
>>> y_l_mo = np.array([[9, 19], [11, np.nan]]) # (2 samples, 2 outputs)
>>> y_u_mo = np.array([[11, 21], [13, 23]])
>>> # Widths: [[2, 2], [2, nan]]
>>> score_mo_raw = mean_interval_width_score(
...     y_l_mo, y_u_mo, multioutput='raw_values', nan_policy='propagate'
... )
>>> # Output 0 widths: [2, 2]. Mean = 2.0
>>> # Output 1 widths: [2, nan]. Mean = nan
>>> print(f"MIW (multi-output, raw, propagate): {score_mo_raw}")
MIW (multi-output, raw, propagate): [ 2. nan]