fusionlab.utils.ts_utils.create_lag_features¶
- fusionlab.utils.ts_utils.create_lag_features(df, value_col, dt_col=None, lag_features=None, lags=[1, 2], dropna=True, include_original=True, reset_index=True, verbose=0)[source]¶
Generate lag features for a time series to capture temporal dependencies. Lag features are delayed copies of an original variable, enabling predictive models to learn from previous values.
Formally, if \(X_t\) denotes the value at time \(t\), then for a given lag \(\ell\), the lag feature \(X_{t-\ell}\) provides the value of \(X\) at \(t-\ell\) [1]. For multiple lags, the output DataFrame includes columns like:
\[X_{t-1}, \; X_{t-3}, \; X_{t-7}, \ldots\]- Parameters:
df (
pandas.DataFrame) – The input DataFrame containing time series data. Must have at least one time-like column or index.value_col (
str) – The name of the target column (time series variable) for which lag features are created.dt_col (
str, optional) – Name of the datetime column if not using the index. IfNone, the function assumes the DataFrame index is datetime or validated byts_validator().lag_features (
listofstr, optional) – Additional feature columns (besidesvalue_col) for which to create lag features. IfNone, only the target column (value_col) is used.lags (
listofint, optional) – List of lag intervals to create. For example,[1, 3, 7]generates columns<feature>_lag_1, <feature>_lag_3, <feature>_lag_7.dropna (
bool, optional) – IfTrue, drops rows with any NaN introduced by shifting (i.e. the first few rows that cannot have lag values).include_original (
bool, optional) – IfTrue, concatenates the original columns with the newly created lag columns in the output.reset_index (
bool, optional) – IfTrue, resets the index of the resulting DataFrame. This can be helpful if lagging or sorting modifies the index alignment.verbose (
int, optional) –Verbosity level. Higher values print more info about the process:
0 : No printing.
1 : Basic logs.
- Returns:
lagged_df – DataFrame containing the newly generated lag features, and optionally the original features (depending on
include_original). Ifdropna=True, rows lacking sufficient history are removed.- Return type:
pandas.DataFrame
Examples
>>> import pandas as pd >>> from fusionlab.utils.ts_utils import create_lag_features >>> data = { ... 'Date': [ ... '2021-01-01','2021-01-02','2021-01-03', ... '2021-01-04','2021-01-05' ... ], ... 'Sales': [10, 12, 14, 13, 15] ... } >>> df = pd.DataFrame(data) >>> df['Date'] = pd.to_datetime(df['Date']) >>> lagged_df = create_lag_features( ... df, value_col='Sales', ... dt_col='Date', ... lags=[1, 2], ... dropna=True, ... verbose=1 ... ) Target variable: Sales Datetime column: Date Lag intervals: [1, 2] Creating lag features for: Sales Rows dropped due to NaN values: 2
Notes
By introducing lagged versions of the target (and possibly other columns), models can learn from past states of the system. However, each additional lag typically reduces the row count if
dropna=True, because the first \(\max(lags)\) observations cannot have complete lag values.See also
ts_engineeringA broader utility for generating lag features, rolling stats, and other transformations.
transform_stationarityConvert non-stationary series to stationary (e.g. differencing).
References