fusionlab.utils.ts_utils.infer_decomposition_method

fusionlab.utils.ts_utils.infer_decomposition_method(df, dt_col, period=12, return_components=False, view=False, figsize=(10, 8), method='heuristic', verbose=0)[source]

Determine the best decomposition approach for a time series, offering two modes:

  1. method='heuristic': Checks if all data points are strictly positive and decides on multiplicative if they are, or additive otherwise. This approach does not evaluate the fit.

  2. method='variance_comparison': Performs both additive and multiplicative decompositions, compares residual variances, and chooses the method with the smaller residual variance.

\[\text{Additive: } Y_t = T_t + S_t + \epsilon_t\]
\[\text{Multiplicative: } Y_t = T_t \times S_t \times \epsilon_t \quad\text{or}\quad \log(Y_t) = \log(T_t) + \log(S_t) + \epsilon_t.\]
Parameters:
  • df (pandas.DataFrame) – The DataFrame containing time series data. Must include the datetime column dt_col and at least one column of values to decompose.

  • dt_col (str) – The column name representing datetime. This column is set as the index for decomposition.

  • period (int, optional) – The seasonal period (frequency) for decomposition. Commonly, 12 for monthly data showing yearly seasonality.

  • return_components (bool, optional) – If True, returns a dictionary of decomposition components (trend, seasonal, residual). Otherwise, returns only the chosen model.

  • view (bool, optional) – If True, displays histograms of residuals in the variance_comparison mode to facilitate comparison.

  • figsize (tuple of (float, float), optional) – Figure dimensions for residual plots.

  • method ({'heuristic','variance_comparison'}, optional) –

    Strategy for deciding on the decomposition approach:

    • 'heuristic': If all data points are positive, uses 'multiplicative'; else 'additive'.

    • 'variance_comparison': Tries both models, compares the variance of residuals, and picks the one with smaller residual variance.

  • verbose ({0, 1, 2, 3}, optional) –

    Control the amount of logging:

    • 0 : No messages printed.

    • 1 : Basic info about chosen model and decomposition.

    • 2 : Additional details about data checks.

    • 3 : Very detailed logs, including internal states and partial results.

Returns:

  • best_method (str) – The chosen decomposition type: 'additive' or 'multiplicative'.

  • components (dict, optional) – Returned only if return_components=True. Contains the keys 'trend', 'seasonal', and 'residual' mapped to pandas.Series objects from the best decomposition.

Notes

Selecting an appropriate decomposition model can be crucial for capturing both trend and seasonality accurately [1]. In particular, the variance comparison approach ensures a more data-driven selection [2].

Examples

>>> import pandas as pd
>>> from fusionlab.utils.ts_utils import infer_decomposition_method
>>> data = {
...     'Date': [
...         '2020-01-01','2020-02-01','2020-03-01',
...         '2020-04-01','2020-05-01'
...     ],
...     'Sales': [100, 120, 140, 135, 150]
... }
>>> df = pd.DataFrame(data)
>>> df['Date'] = pd.to_datetime(df['Date'])
>>> best_model = infer_decomposition_method(
...     df, dt_col='Date', period=12,
...     method='heuristic', verbose=2
... )
Checking positivity for heuristic method...
All values are > 0. Using 'multiplicative' model.
>>> best_model
'multiplicative'

See also

seasonal_decompose

Decompose a time series into trend, seasonal, and residual components.

References