fusionlab.utils.augment_series_features¶
- fusionlab.utils.augment_series_features(series_df, feature_cols, noise_level=0.01, noise_type='gaussian', random_seed=None, savefile=None)[source]¶
Augments specified feature columns in a time series DataFrame by adding noise.
This function is typically applied to sequences that are already long enough, to create more training examples and improve model robustness.
Suppose \(x_i\) are original feature values for one column. Then:
For Gaussian noise: .. math:
\text{noise}_i \sim \mathcal{N}\bigl(0,\; \sigma_x \times \text{noise\_level}\bigr), \quad \sigma_x = \mathrm{std}(x_i).
The augmented values are \(x_i + \text{noise}_i\).
For Uniform noise: .. math:
\text{range}_x = \max(x_i) - \min(x_i), \quad \text{noise}_i \sim \mathcal{U}\Bigl( -\tfrac{\text{range}_x \times \text{noise\_level}}{2},\; \tfrac{\text{range}_x \times \text{noise\_level}}{2}\Bigr).
The augmented values are \(x_i + \text{noise}_i\).
- Parameters:
series_df (
pd.DataFrame) – Input DataFrame representing one or more time series.feature_cols (
List[str]) – List of column names (features) to which noise will be added.noise_level (
float, default0.01) –Magnitude of the noise. - For ‘gaussian’: standard deviation of the noise relative to
feature’s std.
For ‘uniform’: half-width of the uniform distribution relative to feature’s range.
noise_type (
str, default'gaussian') – Type of noise to add. Options: ‘gaussian’, ‘uniform’.random_seed (
intorNone, defaultNone) – Seed for the random number generator for reproducible results.savefile (str | None)
- Returns:
DataFrame with noise added to the specified feature columns.
- Return type:
pd.DataFrame- Raises:
ValueError – If feature_cols are not in series_df or noise_type is invalid.
TypeError – If inputs are not of the expected type.
Examples
>>> import pandas as pd >>> from fusionlab.utils.geo_utils import augment_series_features >>> df = pd.DataFrame({ ... 'x': [10.0, 12.0, 15.0, 13.0], ... 'y': [100, 110, 105, 115] ... }) >>> # Add Gaussian noise at 5% level to column 'x' >>> df_aug = augment_series_features(df, ['x'], noise_level=0.05, ... noise_type='gaussian', ... random_seed=42) >>> df_aug['x'] 0 10.248357 1 11.930868 2 15.323844 3 12.761515 Name: x, dtype: float64
Notes
If a feature column has zero variance or NaN range, no noise is added and a debug log is emitted.
Non-numeric columns are skipped with a warning.
See also
pandas.DataFrame.sampleRandom sampling methods.
sklearn.utils.resampleResampling utilities for data augmentation.