fusionlab.utils.ts_utils.ts_engineering¶
- fusionlab.utils.ts_utils.ts_engineering(df, value_col, dt_col=None, lags=5, window=7, diff_order=1, seasonal_period=None, apply_fourier=False, holiday_df=None, robust_diff=True, scaler='z-norm', **kwargs)[source]¶
Perform feature engineering on a time series to create relevant predictors for machine learning models. The function can generate lag features, rolling statistics, differences, Fourier transforms, holiday indicators, and applies optional scaling.
Specifically, let \(X_t\) be the time series at time \(t\). This function will create features such as \(X_{t-1}, \dots, X_{t-l}\), rolling means \(\frac{1}{w}\sum_{i=0}^{w-1}X_{t-i}\), and so on, enabling predictive models to capture temporal dependencies [1].
- Parameters:
df (
pandas.DataFrame) – The input DataFrame containing the time series data plus any additional columns.value_col (
str) – The name of the column indfrepresenting the primary time series for which features are derived.dt_col (
str, optional) – The name of the datetime column, if the DataFrame index is not already a time index. If provided, it may be used for indexing or validations.lags (
int, optional) – Number of lag features to create. For example,lags=5yields columns for \(X_{t-1}, X_{t-2}, \ldots, X_{t-5}\).window (
int, optional) – Window size for rolling statistics. For example, a 7-day rolling average ifwindow=7.diff_order (
int, optional) – Order of differencing to apply to the target time series.0means no differencing,1means first differencing, etc.seasonal_period (
int, optional) – Specifies the seasonal period for seasonal differencing. For example,12for monthly data exhibiting yearly seasonality.apply_fourier (
bool, optional) – IfTrue, computes a discrete Fourier transform of the time series and includes its magnitudes as additional features.holiday_df (
pandas.DataFrame, optional) – DataFrame containing holiday dates for adding a holiday indicator feature. The DataFrame should have a column named, for instance,'date'listing holiday dates.robust_diff (
bool, optional) – Placeholder flag indicating whether robust differencing should be used. Implementation details may vary, but in this snippet it is not used explicitly.scaler (
{'z-norm', 'minmax', None}, optional) –The scaling approach for numeric features:
'z-norm': Apply standard normalization \(Z = (X - \mu)/\sigma\).'minmax': Rescale to [0, 1].None: No scaling is applied.
**kwargs (
dict, optional) – Additional parameters for customization, e.g., methods to handle missing values.
- Returns:
df – The DataFrame with newly created time-series features. This includes lag columns, rolling statistics, differenced series, Fourier terms, holiday indicator, time-based components (year, month, day, etc.), and optionally scaled numeric features.
- Return type:
pandas.DataFrame
Notes
Feature engineering in time series is crucial for capturing temporal dependencies and seasonality. Lag features help machine learning models exploit autocorrelations, rolling windows expose local trends, and Fourier terms can capture complex seasonality beyond simple differencing [2].
Examples
>>> import pandas as pd >>> import numpy as np >>> from fusionlab.utils.ts_utils import ts_engineering
>>> # Generate synthetic daily data >>> idx = pd.date_range( ... start='2020-01-01', ... periods=30, ... freq='D' ... ) >>> data = { ... 'Date': idx, ... 'Sales': np.random.randint(50, 150, len(idx)) ... } >>> df = pd.DataFrame(data) >>> df.set_index('Date', inplace=True)
>>> # Perform feature engineering with 3 lags, 7-day window, ... # first differencing, and scaled features >>> df_features = ts_engineering( ... df, ... value_col='Sales', ... lags=3, ... window=7, ... diff_order=1, ... scaler='z-norm' ... ) >>> df_features.head()
See also
pandas.DataFrame.shiftUsed for lagging.
pandas.DataFrame.rollingUsed for rolling statistics.
scipy.fft.fftDiscrete Fourier transform for capturing high-frequency seasonalities.
References