fusionlab.nn.utils.compute_anomaly_scores¶

fusionlab.nn.utils.compute_anomaly_scores(y_true, y_pred=None, method='statistical', threshold=3.0, domain_func=None, contamination=0.05, epsilon=1e-06, estimator=None, random_state=None, residual_metric='mse', objective='ts', verbose=1)[source]¶

Compute anomaly scores for given true targets using various methods.

This utility function, anomaly_scores, provides a flexible approach to compute anomaly scores outside the XTFT model [1]. Anomaly scores serve as indicators of how unusual certain observations are, guiding the model towards more robust and stable forecasts. By detecting and quantifying anomalies, practitioners can adjust forecasting strategies, improve predictive performance, and handle irregular patterns more effectively.

Parameters:

y_true (np.ndarray) –
The ground truth target values with shape (B, H, O), where: - B: batch size - H: number of forecast horizons (time steps ahead) - O: output dimension (e.g., number of target variables).

Typically, y_true corresponds to the same array passed as the forecast target to the model. All computations of anomalies are relative to these true values or, if provided, their predicted counterparts y_pred.
y_pred (np.ndarray, optional) – The predicted values with shape (B, H, O). If provided and the method is set to ‘residual’, the anomaly scores are derived from the residuals between y_true and y_pred. In this scenario, anomalies reflect discrepancies indicating unusual conditions or model underperformance.
method (str, optional) –
The method used to compute anomaly scores. Supported options: - "statistical" or "stats":
Uses mean and standard deviation of y_true to measure deviation from the mean. Points far from the mean by a certain factor (controlled by threshold) yield higher anomaly scores.

Formally, let \(\mu\) be the mean of y_true and \(\sigma\) its standard deviation. The anomaly score for a point \(y\) is: .. math:
(\frac{y - \mu}{\sigma + \varepsilon})^2
where \(\varepsilon\) is a small constant for numerical stability.
- "domain": Uses a domain-specific heuristic (provided by domain_func) to compute scores. If no domain_func is provided, a default heuristic marks negative values as anomalies.
- "isolation_forest" or "if": Employs the IsolationForest algorithm to detect outliers. The model learns a structure to isolate anomalies more quickly than normal points. Higher contamination rates allow more points to be considered anomalous.
- "residual": If y_pred is provided, anomalies are derived from residuals: the difference (y_true - y_pred). By default, mean squared error (mse) is used. Other metrics include mae and rmse, offering flexibility in quantifying deviations: .. math:
```
\text{MSE: }(y_{true} - y_{pred})^2
```
Default is "statistical".
threshold (float, optional) – Threshold factor for the statistical method. Defines how far beyond mean ± (threshold * std) is considered anomalous. Though not directly applied as a mask here, it can guide interpretation of scores. Default is 3.0.
domain_func (callable, optional) –
A user-defined function for domain method. It takes y_true as input and returns an array of anomaly scores with the same shape. If none is provided, the default heuristic: .. math:
```
\text{anomaly}(y) = \begin{cases}
|y| \times 10 & \text{if } y < 0 \\
0 & \text{otherwise}
\end{cases}
```
contamination (float, optional) – Used in the isolation_forest method. Specifies the proportion of outliers in the dataset. Default is 0.05.
epsilon (float, optional) – A small constant \(\varepsilon\) for numerical stability in calculations, especially during statistical normalization. Default is 1e-6.
estimator (object, optional) – A pre-fitted IsolationForest estimator for the isolation_forest method. If not provided, a new estimator will be created and fitted to y_true.
random_state (int, optional) – Sets a random state for reproducibility in the isolation_forest method.
residual_metric (str, optional) –
The metric used to compute anomalies from residuals if method is set to ‘residual’. Supported metrics: - "mse": mean squared error per point (residuals**2) - "mae": mean absolute error per point |residuals| - "rmse": root mean squared error sqrt((residuals**2))

Default is "mse".
objective (str, optional) – Specifies the type of objective, for future extensibility. Default is "ts" indicating time series. Could be extended for other tasks in the future.
verbose (int, optional) – Controls verbosity. If verbose=1, some messages or warnings may be printed. Higher values might produce more detailed logs.

Returns:

anomaly_scores – An array of anomaly scores with the same shape as y_true. Higher values indicate more unusual or anomalous points.

Return type:

np.ndarray

Notes

Choosing an appropriate method depends on the data characteristics, domain requirements, and model complexity. Statistical methods are quick and interpretable but may oversimplify anomalies. Domain heuristics leverage expert knowledge, while isolation forest applies a more robust, data-driven approach. Residual-based anomalies help assess model performance and highlight periods where the model struggles.

Examples

>>> from fusionlab.nn.losses import compute_anomaly_scores
>>> import numpy as np

>>> # Statistical method example
>>> y_true = np.random.randn(32, 20, 1)  # (B,H,O)
>>> scores = compute_anomaly_scores(y_true, method='statistical', threshold=3)
>>> scores.shape
(32, 20, 1)

>>> # Domain-specific example
>>> def my_heuristic(y):
...     return np.where(y < -1, np.abs(y)*5, 0.0)
>>> scores = compute_anomaly_scores(y_true, method='domain',
                                    domain_func=my_heuristic)

>>> # Isolation Forest example
>>> scores = compute_anomaly_scores(y_true, method='isolation_forest',
                                    contamination=0.1)

>>> # Residual-based example
>>> y_pred = y_true + np.random.normal(0, 1, y_true.shape)  # Introduce noise
>>> scores = compute_anomaly_scores(y_true, y_pred=y_pred, method='residual',
                                    residual_metric='mae')