fusionlab.utils.augment_series_features

fusionlab.utils.augment_series_features(series_df, feature_cols, noise_level=0.01, noise_type='gaussian', random_seed=None, savefile=None)[source]

Augments specified feature columns in a time series DataFrame by adding noise.

This function is typically applied to sequences that are already long enough, to create more training examples and improve model robustness.

Suppose \(x_i\) are original feature values for one column. Then:

  • For Gaussian noise: .. math:

    \text{noise}_i \sim \mathcal{N}\bigl(0,\; \sigma_x \times
    \text{noise\_level}\bigr),
    \quad \sigma_x = \mathrm{std}(x_i).
    

    The augmented values are \(x_i + \text{noise}_i\).

  • For Uniform noise: .. math:

    \text{range}_x = \max(x_i) - \min(x_i), \quad
    \text{noise}_i \sim \mathcal{U}\Bigl(
    -\tfrac{\text{range}_x \times \text{noise\_level}}{2},\;
    \tfrac{\text{range}_x \times \text{noise\_level}}{2}\Bigr).
    

    The augmented values are \(x_i + \text{noise}_i\).

Parameters:
  • series_df (pd.DataFrame) – Input DataFrame representing one or more time series.

  • feature_cols (List[str]) – List of column names (features) to which noise will be added.

  • noise_level (float, default 0.01) –

    Magnitude of the noise. - For ‘gaussian’: standard deviation of the noise relative to

    feature’s std.

    • For ‘uniform’: half-width of the uniform distribution relative to feature’s range.

  • noise_type (str, default 'gaussian') – Type of noise to add. Options: ‘gaussian’, ‘uniform’.

  • random_seed (int or None, default None) – Seed for the random number generator for reproducible results.

  • savefile (str | None)

Returns:

DataFrame with noise added to the specified feature columns.

Return type:

pd.DataFrame

Raises:
  • ValueError – If feature_cols are not in series_df or noise_type is invalid.

  • TypeError – If inputs are not of the expected type.

Examples

>>> import pandas as pd
>>> from fusionlab.utils.geo_utils import augment_series_features
>>> df = pd.DataFrame({
...     'x': [10.0, 12.0, 15.0, 13.0],
...     'y': [100, 110, 105, 115]
... })
>>> # Add Gaussian noise at 5% level to column 'x'
>>> df_aug = augment_series_features(df, ['x'], noise_level=0.05,
...                                  noise_type='gaussian',
...                                  random_seed=42)
>>> df_aug['x']
0    10.248357
1    11.930868
2    15.323844
3    12.761515
Name: x, dtype: float64

Notes

  • If a feature column has zero variance or NaN range, no noise is added and a debug log is emitted.

  • Non-numeric columns are skipped with a warning.

See also

pandas.DataFrame.sample

Random sampling methods.

sklearn.utils.resample

Resampling utilities for data augmentation.