fusionlab.utils.ts_utils.ts_outlier_detector¶
- fusionlab.utils.ts_utils.ts_outlier_detector(df, dt_col=None, value_col=None, method='zscore', threshold=3, view=False, fig_size=(10, 5), show_grid=True, drop=False, verbose=0)[source]¶
Detect outliers in a time series using either Z-Score or Interquartile Range (IQR). Outliers can optionally be removed from the DataFrame.
In many time-series analyses, anomalous points can distort model training or skew statistical inferences. Common outlier detection approaches include the Z-Score:
\[Z_t = \frac{X_t - \mu}{\sigma},\]which flags points for which \(|Z_t| > \text{threshold}\).
- Parameters:
df (
pandas.DataFrame) – The input DataFrame containing the time series data. Must include a datetime column or index.dt_col (
str, optional) – Column name representing the datetime dimension. IfNone, the function assumes the index is datetime-like or uses ts_validator.value_col (
str, optional) – Name of the target variable in the DataFrame (e.g., “Sales”).method (
{'zscore', 'iqr'}, optional) –'zscore': Use Z-Scores to detect outliers.'iqr': Use the Interquartile Range method, \(Q_1\) and \(Q_3\) scaled bythreshold * IQR.
threshold (
intorfloat, optional) – Threshold multiplier for the chosen method. For Z-Scores, it represents how many standard deviations above/below the mean qualifies as an outlier (default=3). For IQR, it is the multiplier applied to the IQR to define lower and upper bounds.view (
bool, optional) – IfTrue, displays a plot marking outliers in red over the original time series.fig_size (
tupleof(float,float), optional) – The size of the figure (width, height) if visualizing.show_grid (
bool, optional) – Whether to display gridlines in the plot.drop (
bool, optional) – IfTrue, removes the rows flagged as outliers fromdf.verbose (
int, optional) –Verbosity level:
0 : No console messages.
1 : Basic information about outlier counts.
2+ : (Not implemented here, but can be extended).
- Returns:
result – The original DataFrame with a new column
'is_outlier'marking outlier rows (True/False), unlessdrop=True. In that case, it returns the DataFrame after removing these rows (and without the extra column).- Return type:
pandas.DataFrame
Examples
>>> import pandas as pd >>> from fusionlab.utils.ts_utils import ts_outlier_detector >>> data = { ... 'Date': [ ... '2021-01-01','2021-01-02','2021-01-03', ... '2021-01-04','2021-01-05','2021-01-06' ... ], ... 'Sales': [10, 100, 12, 13, 200, 15] ... } >>> df = pd.DataFrame(data) >>> df['Date'] = pd.to_datetime(df['Date']) >>> df_out = ts_outlier_detector( ... df, ... dt_col='Date', ... value_col='Sales', ... method='zscore', ... threshold=2.5, ... view=True, ... drop=False, ... verbose=1 ... ) Target variable: Sales Datetime column: Date Outlier detection method: zscore, Threshold: 2.5 Detecting outliers using Z-Score... Number of outliers detected: 2 Outliers retained in the DataFrame.
Notes
The choice of outlier detection (Z-Score vs. IQR) can be context dependent. Z-Scores assume a somewhat normal distribution of data [1] while IQR is more robust to skewed distributions [2].
See also
ts_engineeringBroader time-series feature engineering (lags, rolling statistics, etc.).
transform_stationarityTechniques for removing trends or stabilizing variance.
References