fusionlab.utils.ts_utils.create_lag_features

fusionlab.utils.ts_utils.create_lag_features(df, value_col, dt_col=None, lag_features=None, lags=[1, 2], dropna=True, include_original=True, reset_index=True, verbose=0)[source]

Generate lag features for a time series to capture temporal dependencies. Lag features are delayed copies of an original variable, enabling predictive models to learn from previous values.

Formally, if \(X_t\) denotes the value at time \(t\), then for a given lag \(\ell\), the lag feature \(X_{t-\ell}\) provides the value of \(X\) at \(t-\ell\) [1]. For multiple lags, the output DataFrame includes columns like:

\[X_{t-1}, \; X_{t-3}, \; X_{t-7}, \ldots\]
Parameters:
  • df (pandas.DataFrame) – The input DataFrame containing time series data. Must have at least one time-like column or index.

  • value_col (str) – The name of the target column (time series variable) for which lag features are created.

  • dt_col (str, optional) – Name of the datetime column if not using the index. If None, the function assumes the DataFrame index is datetime or validated by ts_validator().

  • lag_features (list of str, optional) – Additional feature columns (besides value_col) for which to create lag features. If None, only the target column (value_col) is used.

  • lags (list of int, optional) – List of lag intervals to create. For example, [1, 3, 7] generates columns <feature>_lag_1, <feature>_lag_3, <feature>_lag_7.

  • dropna (bool, optional) – If True, drops rows with any NaN introduced by shifting (i.e. the first few rows that cannot have lag values).

  • include_original (bool, optional) – If True, concatenates the original columns with the newly created lag columns in the output.

  • reset_index (bool, optional) – If True, resets the index of the resulting DataFrame. This can be helpful if lagging or sorting modifies the index alignment.

  • verbose (int, optional) –

    Verbosity level. Higher values print more info about the process:

    • 0 : No printing.

    • 1 : Basic logs.

Returns:

lagged_df – DataFrame containing the newly generated lag features, and optionally the original features (depending on include_original). If dropna=True, rows lacking sufficient history are removed.

Return type:

pandas.DataFrame

Examples

>>> import pandas as pd
>>> from fusionlab.utils.ts_utils import create_lag_features
>>> data = {
...     'Date': [
...         '2021-01-01','2021-01-02','2021-01-03',
...         '2021-01-04','2021-01-05'
...     ],
...     'Sales': [10, 12, 14, 13, 15]
... }
>>> df = pd.DataFrame(data)
>>> df['Date'] = pd.to_datetime(df['Date'])
>>> lagged_df = create_lag_features(
...     df, value_col='Sales',
...     dt_col='Date',
...     lags=[1, 2],
...     dropna=True,
...     verbose=1
... )
Target variable: Sales
Datetime column: Date
Lag intervals: [1, 2]
Creating lag features for: Sales
Rows dropped due to NaN values: 2

Notes

By introducing lagged versions of the target (and possibly other columns), models can learn from past states of the system. However, each additional lag typically reduces the row count if dropna=True, because the first \(\max(lags)\) observations cannot have complete lag values.

See also

ts_engineering

A broader utility for generating lag features, rolling stats, and other transformations.

transform_stationarity

Convert non-stationary series to stationary (e.g. differencing).

References