fusionlab.utils.augment_series_features¶

fusionlab.utils.augment_series_features(series_df, feature_cols, noise_level=0.01, noise_type='gaussian', random_seed=None, savefile=None)[source]¶

Augments specified feature columns in a time series DataFrame by adding noise.

This function is typically applied to sequences that are already long enough, to create more training examples and improve model robustness.

Suppose \(x_i\) are original feature values for one column. Then:

For Gaussian noise: .. math:

\text{noise}_i \sim \mathcal{N}\bigl(0,\; \sigma_x \times
\text{noise\_level}\bigr),
\quad \sigma_x = \mathrm{std}(x_i).

The augmented values are \(x_i + \text{noise}_i\).

For Uniform noise: .. math:

\text{range}_x = \max(x_i) - \min(x_i), \quad
\text{noise}_i \sim \mathcal{U}\Bigl(
-\tfrac{\text{range}_x \times \text{noise\_level}}{2},\;
\tfrac{\text{range}_x \times \text{noise\_level}}{2}\Bigr).

The augmented values are \(x_i + \text{noise}_i\).

Parameters:

series_df (pd.DataFrame) – Input DataFrame representing one or more time series.
feature_cols (List[str]) – List of column names (features) to which noise will be added.
noise_level (float, default 0.01) –
Magnitude of the noise. - For ‘gaussian’: standard deviation of the noise relative to

feature’s std.
- For ‘uniform’: half-width of the uniform distribution relative to feature’s range.
noise_type (str, default 'gaussian') – Type of noise to add. Options: ‘gaussian’, ‘uniform’.
random_seed (int or None, default None) – Seed for the random number generator for reproducible results.
savefile (str | None)

Returns:

DataFrame with noise added to the specified feature columns.

Return type:

pd.DataFrame

Raises:

ValueError – If feature_cols are not in series_df or noise_type is invalid.
TypeError – If inputs are not of the expected type.

Examples

>>> import pandas as pd
>>> from fusionlab.utils.geo_utils import augment_series_features
>>> df = pd.DataFrame({
...     'x': [10.0, 12.0, 15.0, 13.0],
...     'y': [100, 110, 105, 115]
... })
>>> # Add Gaussian noise at 5% level to column 'x'
>>> df_aug = augment_series_features(df, ['x'], noise_level=0.05,
...                                  noise_type='gaussian',
...                                  random_seed=42)
>>> df_aug['x']
0    10.248357
1    11.930868
2    15.323844
3    12.761515
Name: x, dtype: float64

Notes

If a feature column has zero variance or NaN range, no noise is added and a debug log is emitted.
Non-numeric columns are skipped with a warning.