fusionlab.utils.augment_spatiotemporal_data

fusionlab.utils.augment_spatiotemporal_data(df, mode, group_by_cols=None, time_col=None, value_cols_interpolate=None, feature_cols_augment=None, interpolation_kwargs=None, augmentation_kwargs=None, savefile=None, verbose=False)[source]

Applies temporal interpolation and/or feature augmentation to a spatiotemporal DataFrame.

This function can perform one of three operations on each group of the DataFrame:

  1. \(\text{interpolate}\) only: fill temporal gaps via interpolate_temporal_gaps.

  2. \(\text{augment\_features}\) only: add noise to features via augment_series_features.

  3. \(\text{both}\): first interpolate, then augment features.

Let \(G\) be the set of groups defined by group_by_cols. For each group \(g \in G\), if mode includes interpolation, we compute:

\[\text{interpolated\_df}_g = \text{interpolate\_temporal\_gaps}( \text{series\_df}_g,\;\dots )\]

Then if mode includes augmentation, we compute:

\[\text{augmented\_df}_g = \text{augment\_series\_features}( \text{interpolated\_df}_g,\;\dots )\]

Finally, all processed groups are concatenated:

\[\text{result} = \bigcup_{g \in G} \text{processed\_df}_g.\]
Parameters:
  • df (pd.DataFrame) – The input DataFrame (e.g., Zhongshan data).

  • mode (str) –

    The augmentation mode. Options: - ‘interpolate’: Applies only

    interpolate_temporal_gaps().

    • ’augment_features’: Applies only augment_series_features().

    • ’both’: Applies interpolate_temporal_gaps first, then augment_series_features.

  • group_by_cols (list of str or None, default None) – Columns to group by for temporal interpolation (e.g., [‘longitude’, ‘latitude’]). Required if mode includes interpolation.

  • time_col (str or None, default None) – Name of the time column. Required if mode includes interpolation.

  • value_cols_interpolate (list of str or None, default None) – Columns to interpolate. Required if mode includes interpolation.

  • feature_cols_augment (list of str or None, default None) – Columns for noise augmentation. Required if mode includes augmentation.

  • interpolation_kwargs (dict or None, default None) – Keyword arguments passed to interpolate_temporal_gaps() (e.g., {‘freq’: ‘AS’}).

  • augmentation_kwargs (dict or None, default None) – Keyword arguments passed to augment_series_features() (e.g., {‘noise_level’: 0.02}).

  • savefile (str, optional,) – Save the dataframe into the csv format by default.

  • verbose (bool, default False) – If True, prints progress messages (via print). Otherwise, relies on logger.

Returns:

The processed DataFrame. Groups are reassembled in original order of grouping and then concatenated.

Return type:

pd.DataFrame

Raises:

ValueError – If mode is invalid or required parameters for the selected mode are missing.

Examples

>>> import pandas as pd
>>> from fusionlab.utils.geo_utils import augment_spatiotemporal_data
>>> df = pd.DataFrame({
...     'lon': [0, 0, 1, 1],
...     'lat': [0, 0, 1, 1],
...     'date': ['2020-01-01', '2020-01-03',
...              '2020-01-01', '2020-01-04'],
...     'value': [1.0, None, 2.0, None]
... })
>>> result = augment_spatiotemporal_data(
...     df,
...     mode='both',
...     group_by_cols=['lon', 'lat'],
...     time_col='date',
...     value_cols_interpolate=['value'],
...     feature_cols_augment=['value'],
...     interpolation_kwargs={'freq': 'D'},
...     augmentation_kwargs={'noise_level': 0.05,
...                          'noise_type': 'gaussian',
...                          'random_seed': 0}
... )
>>> 'value' in result.columns
True

Notes

  • Ensure df contains all columns in group_by_cols if mode includes interpolation.

  • Date column must be convertible to datetime.

  • Groups are processed independently, then concatenated.

See also

interpolate_temporal_gaps

Fill temporal gaps per group.

augment_series_features

Add noise to feature columns.