fusionlab.utils.ts_utils.ts_corr_analysis¶
- fusionlab.utils.ts_utils.ts_corr_analysis(df, dt_col, value_col, lags=2, features=None, view_acf_pacf=True, view_cross_corr=True, fig_size=(14, 6), show_grid=True, cross_corr_on_sep=False, verbose=0)[source]¶
Perform correlation analysis on a time series dataset, including autocorrelation (ACF), partial autocorrelation (PACF), and cross-correlation with external features.
\[\rho(h) = \frac{E\big[(X_t - \mu)(X_{t+h} - \mu)\big]} {\sigma^2},\]where \(h\) denotes the lag, \(\mu\) the mean, and \(\sigma^2\) the variance of the time series [1].
- Parameters:
df (
pandas.DataFrame) – The input DataFrame containing time series data. Must contain at least one time-like column or index.dt_col (
str) – Column name representing the datetime dimension (e.g. “DateTime” or “timestamp”).value_col (
str) – Name of the primary target variable column (e.g. “sales”).lags (
int, optional) – Number of time lags for ACF/PACF analysis. Default is 2.features (
listofstr, optional) – List of external feature columns to analyze for cross-correlation withvalue_col. IfNone, uses all non-target, non-datetime columns indf.view_acf_pacf (
bool, optional) – Whether to generate and display ACF and PACF plots.view_cross_corr (
bool, optional) – Whether to visualize cross-correlations for selected external features.fig_size (
tupleof(float,float), optional) – Figure dimension for ACF/PACF plots and optionally cross-correlation bars. Default is (14, 6).show_grid (
bool, optional) – Whether to display gridlines in the plots. Default is True.cross_corr_on_sep (
bool, optional) – IfTrue, plots cross-correlation results in a separate figure. IfFalseandview_cross_corr=True, it appends the cross-corr plot to the same figure containing ACF/PACF (if feasible).verbose (
int, optional) –Verbosity level:
0: No console messages.1: Basic info messages.2: More detailed logs.
- Returns:
results – Dictionary of correlation metrics:
'acf_values': ACF values up tolags.'pacf_values': PACF values up tolags.'cross_corr': Cross-correlation coefficients (and p-values) for external features.
- Return type:
dict
Notes
This function can aid in both univariate and multivariate time series analysis. By assessing ACF and PACF, users glean insights about autocorrelation structure (e.g. potential AR or MA terms in ARIMA). Cross-correlation helps identify external predictors correlated with the target [2].
Examples
>>> import pandas as pd >>> from fusionlab.utils.ts_utils import ts_corr_analysis >>> data = { ... 'Date': [ ... '2021-01-01','2021-01-02','2021-01-03', ... '2021-01-04','2021-01-05' ... ], ... 'Sales': [10, 12, 14, 13, 15], ... 'Promo': [0, 1, 0, 1, 1] ... } >>> df = pd.DataFrame(data) >>> results = ts_corr_analysis( ... df, ... dt_col='Date', ... value_col='Sales', ... lags=1, ... features=['Promo'], ... view_acf_pacf=True, ... view_cross_corr=True, ... verbose=1 ... ) Performing ACF and PACF analysis... Target variable: Sales Datetime column: Date Cross-correlation features: ['Promo'] Performing cross-correlation analysis... CrossCorrResults > item 1: correlation=0.2890, p_value=0.6367
See also
statsmodels.graphics.tsaplots.plot_acfPlot the autocorrelation function.
statsmodels.graphics.tsaplots.plot_pacfPlot the partial autocorrelation function.
References