fusionlab.nn.utils.prepare_spatial_future_data

fusionlab.nn.utils.prepare_spatial_future_data(final_processed_data, feature_columns, dynamic_feature_indices, sequence_length=1, dt_col='date', static_feature_names=None, forecast_horizon=None, future_years=None, encoded_cat_columns=None, scaling_params=None, spatial_cols=None, squeeze_last=False, verbosity=0)[source]

Prepare future static and dynamic inputs for making predictions.

This function prepares the necessary static and dynamic inputs required for forecasting future values in time series data. It processes the provided dataset by grouping it by location_id, extracting the last sequence of data points based on the specified sequence_length, and generating future inputs for prediction over the defined forecast_horizon.

The function handles both integer and datetime representations of the dt_col, extracting the year from datetime columns when necessary. It also allows for flexibility in specifying static features and encoded categorical variables.

\[\text{scaled\_time} = \frac{\text{future\_time} - \mu}{\sigma}\]
Parameters:
  • final_processed_data (pandas.DataFrame) – The processed DataFrame containing all features and targets. Must include the location_id column and the specified dt_col.

  • feature_columns (List[str]) – List of feature column names to be used for dynamic input preparation.

  • dynamic_feature_indices (List[int]) – Indices of dynamic features in feature_columns. These features are considered time-dependent and are used to prepare dynamic inputs.

  • sequence_length (int, optional) – The number of past time steps to include in each input sequence. Default is 1.

  • dt_col (str, optional) – The name of the time-related column in final_processed_data. Defaults to 'date'.

  • static_feature_names (List[str], optional) – List of static feature column names. If not provided, defaults to ['longitude', 'latitude'] plus any encoded_cat_columns.

  • forecast_horizon (int, optional) – The number of future time steps to predict. If set to None, the function defaults to predicting the next immediate time step.

  • future_years (List[int], optional) – List of future years to predict. Must match the length of forecast_horizon if forecast_horizon is provided.

  • encoded_cat_columns (List[str], optional) – List of encoded categorical column names to be treated as static features.

  • scaling_params (Dict[str, Dict[str, float]], optional) – Dictionary containing scaling parameters (mean and standard deviation) for features. Example: {'year': {'mean': 2000, 'std': 10}}. If not provided, the function computes the mean and std for the dt_col.

  • squeeze_last (bool, default True,) – Squeeze the last axis which correspond to the output dimension y if equal to 1.

  • verbosity (int, optional) – Verbosity level from 0 to 7 for debugging and understanding the process. Higher values produce more detailed logs.

  • spatial_cols (Tuple[str, str])

Returns:

A tuple containing:

  • future_static_inputsnumpy.ndarray

    Array of future static inputs with shape (num_samples, num_static_vars, 1).

  • future_dynamic_inputsnumpy.ndarray

    Array of future dynamic inputs with shape (num_samples, sequence_length, num_dynamic_vars, 1).

  • future_years_listList[int]

    List of future time values corresponding to each sample.

  • location_ids_listList[int]

    List of location IDs corresponding to each sample.

  • longitudesList[float]

    List of longitude values corresponding to each sample.

  • latitudesList[float]

    List of latitude values corresponding to each sample.

Return type:

Tuple[np.ndarray, np.ndarray, List[int], List[int], List[float], List[float]]

Examples

>>> from fusionlab.nn.utils import prepare_spatial_future_data
>>> import pandas as pd
>>> data = pd.DataFrame({
...     'location_id': [1, 1, 1, 2, 2, 2],
...     'year': [2018, 2019, 2020, 2018, 2019, 2020],
...     'longitude': [10.0, 10.0, 10.0, 20.0, 20.0, 20.0],
...     'latitude': [50.0, 50.0, 50.0, 60.0, 60.0, 60.0],
...     'temperature': [15, 16, 15.5, 20, 21, 20.5],
...     'rainfall': [100, 110, 105, 200, 210, 205],
...     'encoded_cat': [1, 1, 1, 2, 2, 2]
... })
>>> feature_cols = ['year', 'temperature', 'rainfall', 'encoded_cat']
>>> dynamic_indices = [0, 1, 2]
>>> future_static, future_dynamic, future_years, loc_ids, longs,\
    lats = prepare_spatial_future_data(
...     final_processed_data=data,
...     feature_columns=feature_cols,
...     dynamic_feature_indices=dynamic_indices,
...     sequence_length=2,
...     forecast_horizon=1,
...     future_years=[2021],
...     encoded_cat_columns=['encoded_cat'],
...     verbosity=5,
...     dt_col='year'
... )
>>> print(future_static.shape)
(2, 3, 1)
>>> print(future_dynamic.shape)
(2, 2, 3, 1)

Notes

  • The function handles both integer and datetime representations of the dt_col. If dt_col is a datetime type, the year is extracted for scaling purposes.

  • If forecast_horizon is set to None, the function defaults to generating data for the next immediate time step based on the last entry in the time column.

  • Ensure that the length of future_years matches forecast_horizon if forecast_horizon is provided.

  • The static_feature_names parameter allows for flexibility in specifying which static features to include. If not provided, it defaults to ['longitude', 'latitude'] plus any encoded_cat_columns.

See also

prepare_future_data

Main function for preparing future data inputs.

References