fusionlab.utils.ts_utils.ts_validator¶
- fusionlab.utils.ts_utils.ts_validator(df, dt_col=None, to_datetime=None, as_index='auto', error='raise', return_dt_col=False, ensure_order=False, verbose=0)[source]¶
Validate and preprocess time series data, ensuring the presence of a properly formatted datetime column or index. This function can automatically convert a given column or the DataFrame index to a datetime type, sort the data by time order, and optionally set the datetime column as the index.
In a more formal sense, let \(\{x_t\}_{t=1}^N\) represent a time series with \(t\) denoting the time index and \(N\) the number of observations [1]. The role of this function is to ensure the alignment:
\[t_1 < t_2 < \ldots < t_N\]so that any subsequent operations or modeling steps assume valid temporal ordering.
- Parameters:
df (
pandas.DataFrame) – The input DataFrame which must contain time-related information. Ifdt_colis not provided, the index ofdfmay be used or converted if possible.dt_col (
str, optional) – The column name representing date/time. Ifdt_colis not found in the DataFrame columns or index, behavior depends on theerrorparameter.to_datetime (
{None, 'auto', 'Y', 'M', 'W', 'D', 'H', 'min', 's'}, optional) –Controls how to convert the detected time column if it is not already in a datetime format. Examples:
None: No forced conversion; only format detection.'auto': Automatic inference of the date/time format.'D': Convert using daily periods, etc.
as_index (
boolor{"auto"}, optional) –Whether to set the time column as index after conversion:
True: Setdt_colas the index in the returned DataFrame.False: Retaindt_colas a column."auto": Keep the current structure if the column is already in the index; else, do not change it.
error (
{'raise', 'ignore', 'warn'}, optional) –Defines how to handle potential errors such as an invalid format or missing time column:
'raise': Raise aValueError.'warn': Issue a warning and return the unmodified data if it cannot be converted.'ignore': Silently ignore conversion failures.
return_dt_col (
bool, optional) – IfTrue, return a tuple(df, dt_col)with the final validated DataFrame and the name of the detected time column.ensure_order (
bool, optional) – IfTrue, sorts the DataFrame in ascending time order based on the detected or provideddt_col. For time series modeling, ensuring chronological ordering can be critical.verbose (
int, optional) –Verbosity level. The higher the value, the more information is printed during execution:
0: No output.1: Basic info messages.2: Detailed messages on steps taken.3: Very detailed internal states for debugging.
- Returns:
df (
pandas.DataFrame) – The validated and possibly re-indexed DataFrame with correctly formatted datetime information.(df, dt_col) (
(pandas.DataFrame,str)) – Returned ifreturn_dt_col=True. The first element is the processed DataFrame, and the second element is the detected time column name.
Examples
>>> import pandas as pd >>> from fusionlab.utils.ts_utils import ts_validator
>>> # Example DataFrame with a 'Date' column >>> data = { ... 'Date': ['2020-01-01', '2020-01-02', '2020-01-03'], ... 'Value': [10, 15, 20] ... } >>> df = pd.DataFrame(data) >>> validated_df = ts_validator(df, dt_col='Date', ... to_datetime='auto', ... ensure_order=True, ... verbose=1) Datetime column detected: 'Date' ... Datetime conversion applied successfully ... >>> validated_df Value Date 2020-01-01 10 2020-01-02 15 2020-01-03 20
Notes
Proper time series validation and ordering is crucial for reliable forecasting and analysis. If the column or index cannot be converted to a valid datetime format, consider adjusting your time field or using the correct format codes.
See also
ts_engineeringHigher-level features engineering on time series data.
ts_corr_analysisAnalyze correlation structures in time series.
References