fusionlab.utils.augment_spatiotemporal_data¶
- fusionlab.utils.augment_spatiotemporal_data(df, mode, group_by_cols=None, time_col=None, value_cols_interpolate=None, feature_cols_augment=None, interpolation_kwargs=None, augmentation_kwargs=None, savefile=None, verbose=False)[source]¶
Applies temporal interpolation and/or feature augmentation to a spatiotemporal DataFrame.
This function can perform one of three operations on each group of the DataFrame:
\(\text{interpolate}\) only: fill temporal gaps via interpolate_temporal_gaps.
\(\text{augment\_features}\) only: add noise to features via augment_series_features.
\(\text{both}\): first interpolate, then augment features.
Let \(G\) be the set of groups defined by
group_by_cols. For each group \(g \in G\), if mode includes interpolation, we compute:\[\text{interpolated\_df}_g = \text{interpolate\_temporal\_gaps}( \text{series\_df}_g,\;\dots )\]Then if mode includes augmentation, we compute:
\[\text{augmented\_df}_g = \text{augment\_series\_features}( \text{interpolated\_df}_g,\;\dots )\]Finally, all processed groups are concatenated:
\[\text{result} = \bigcup_{g \in G} \text{processed\_df}_g.\]- Parameters:
df (
pd.DataFrame) – The input DataFrame (e.g., Zhongshan data).mode (
str) –The augmentation mode. Options: - ‘interpolate’: Applies only
interpolate_temporal_gaps().’augment_features’: Applies only
augment_series_features().’both’: Applies interpolate_temporal_gaps first, then augment_series_features.
group_by_cols (
listofstrorNone, defaultNone) – Columns to group by for temporal interpolation (e.g., [‘longitude’, ‘latitude’]). Required if mode includes interpolation.time_col (
strorNone, defaultNone) – Name of the time column. Required if mode includes interpolation.value_cols_interpolate (
listofstrorNone, defaultNone) – Columns to interpolate. Required if mode includes interpolation.feature_cols_augment (
listofstrorNone, defaultNone) – Columns for noise augmentation. Required if mode includes augmentation.interpolation_kwargs (
dictorNone, defaultNone) – Keyword arguments passed tointerpolate_temporal_gaps()(e.g., {‘freq’: ‘AS’}).augmentation_kwargs (
dictorNone, defaultNone) – Keyword arguments passed toaugment_series_features()(e.g., {‘noise_level’: 0.02}).savefile (
str,optional,) – Save the dataframe into the csv format by default.verbose (
bool, defaultFalse) – If True, prints progress messages (via print). Otherwise, relies on logger.
- Returns:
The processed DataFrame. Groups are reassembled in original order of grouping and then concatenated.
- Return type:
pd.DataFrame- Raises:
ValueError – If mode is invalid or required parameters for the selected mode are missing.
Examples
>>> import pandas as pd >>> from fusionlab.utils.geo_utils import augment_spatiotemporal_data >>> df = pd.DataFrame({ ... 'lon': [0, 0, 1, 1], ... 'lat': [0, 0, 1, 1], ... 'date': ['2020-01-01', '2020-01-03', ... '2020-01-01', '2020-01-04'], ... 'value': [1.0, None, 2.0, None] ... }) >>> result = augment_spatiotemporal_data( ... df, ... mode='both', ... group_by_cols=['lon', 'lat'], ... time_col='date', ... value_cols_interpolate=['value'], ... feature_cols_augment=['value'], ... interpolation_kwargs={'freq': 'D'}, ... augmentation_kwargs={'noise_level': 0.05, ... 'noise_type': 'gaussian', ... 'random_seed': 0} ... ) >>> 'value' in result.columns True
Notes
Ensure df contains all columns in group_by_cols if mode includes interpolation.
Date column must be convertible to datetime.
Groups are processed independently, then concatenated.
See also
interpolate_temporal_gapsFill temporal gaps per group.
augment_series_featuresAdd noise to feature columns.