fusionlab.utils.ts_utils.ts_validator

fusionlab.utils.ts_utils.ts_validator(df, dt_col=None, to_datetime=None, as_index='auto', error='raise', return_dt_col=False, ensure_order=False, verbose=0)[source]

Validate and preprocess time series data, ensuring the presence of a properly formatted datetime column or index. This function can automatically convert a given column or the DataFrame index to a datetime type, sort the data by time order, and optionally set the datetime column as the index.

In a more formal sense, let \(\{x_t\}_{t=1}^N\) represent a time series with \(t\) denoting the time index and \(N\) the number of observations [1]. The role of this function is to ensure the alignment:

\[t_1 < t_2 < \ldots < t_N\]

so that any subsequent operations or modeling steps assume valid temporal ordering.

Parameters:
  • df (pandas.DataFrame) – The input DataFrame which must contain time-related information. If dt_col is not provided, the index of df may be used or converted if possible.

  • dt_col (str, optional) – The column name representing date/time. If dt_col is not found in the DataFrame columns or index, behavior depends on the error parameter.

  • to_datetime ({None, 'auto', 'Y', 'M', 'W', 'D', 'H', 'min', 's'}, optional) –

    Controls how to convert the detected time column if it is not already in a datetime format. Examples:

    • None: No forced conversion; only format detection.

    • 'auto': Automatic inference of the date/time format.

    • 'D': Convert using daily periods, etc.

  • as_index (bool or {"auto"}, optional) –

    Whether to set the time column as index after conversion:

    • True: Set dt_col as the index in the returned DataFrame.

    • False: Retain dt_col as a column.

    • "auto": Keep the current structure if the column is already in the index; else, do not change it.

  • error ({'raise', 'ignore', 'warn'}, optional) –

    Defines how to handle potential errors such as an invalid format or missing time column:

    • 'raise': Raise a ValueError.

    • 'warn': Issue a warning and return the unmodified data if it cannot be converted.

    • 'ignore': Silently ignore conversion failures.

  • return_dt_col (bool, optional) – If True, return a tuple (df, dt_col) with the final validated DataFrame and the name of the detected time column.

  • ensure_order (bool, optional) – If True, sorts the DataFrame in ascending time order based on the detected or provided dt_col. For time series modeling, ensuring chronological ordering can be critical.

  • verbose (int, optional) –

    Verbosity level. The higher the value, the more information is printed during execution:

    • 0: No output.

    • 1: Basic info messages.

    • 2: Detailed messages on steps taken.

    • 3: Very detailed internal states for debugging.

Returns:

  • df (pandas.DataFrame) – The validated and possibly re-indexed DataFrame with correctly formatted datetime information.

  • (df, dt_col) ((pandas.DataFrame, str)) – Returned if return_dt_col=True. The first element is the processed DataFrame, and the second element is the detected time column name.

Examples

>>> import pandas as pd
>>> from fusionlab.utils.ts_utils import ts_validator
>>> # Example DataFrame with a 'Date' column
>>> data = {
...     'Date': ['2020-01-01', '2020-01-02', '2020-01-03'],
...     'Value': [10, 15, 20]
... }
>>> df = pd.DataFrame(data)
>>> validated_df = ts_validator(df, dt_col='Date',
...                             to_datetime='auto',
...                             ensure_order=True,
...                             verbose=1)
Datetime column detected: 'Date' ...
Datetime conversion applied successfully ...
>>> validated_df
           Value
Date
2020-01-01     10
2020-01-02     15
2020-01-03     20

Notes

Proper time series validation and ordering is crucial for reliable forecasting and analysis. If the column or index cannot be converted to a valid datetime format, consider adjusting your time field or using the correct format codes.

See also

ts_engineering

Higher-level features engineering on time series data.

ts_corr_analysis

Analyze correlation structures in time series.

References