fusionlab.nn.utils.prepare_spatial_future_data

fusionlab.nn.utils.prepare_spatial_future_data(final_processed_data, feature_columns, dynamic_feature_indices, sequence_length=1, dt_col='date', static_feature_names=None, forecast_horizon=None, future_years=None, encoded_cat_columns=None, scaling_params=None, spatial_cols=None, squeeze_last=False, verbosity=0)[source]

Prepare future static and dynamic inputs for making predictions.

This function prepares the necessary static and dynamic inputs required for forecasting future values in time series data. It processes the provided dataset by grouping it by location_id, extracting the last sequence of data points based on the specified sequence_length, and generating future inputs for prediction over the defined forecast_horizon.

The function handles both integer and datetime representations of the dt_col, extracting the year from datetime columns when necessary. It also allows for flexibility in specifying static features and encoded categorical variables.

\[ext{scaled\_time} =\]

rac{ ext{future_time} - mu}{sigma}

final_processed_datapandas.DataFrame

The processed DataFrame containing all features and targets. Must include the location_id column and the specified dt_col.

feature_columnsList[str]

List of feature column names to be used for dynamic input preparation.

dynamic_feature_indicesList[int]

Indices of dynamic features in feature_columns. These features are considered time-dependent and are used to prepare dynamic inputs.

sequence_lengthint, optional

The number of past time steps to include in each input sequence. Default is 1.

dt_colstr, optional

The name of the time-related column in final_processed_data. Defaults to 'date'.

static_feature_namesList[str], optional

List of static feature column names. If not provided, defaults to ['longitude', 'latitude'] plus any encoded_cat_columns.

forecast_horizonint, optional

The number of future time steps to predict. If set to None, the function defaults to predicting the next immediate time step.

future_yearsList[int], optional

List of future years to predict. Must match the length of forecast_horizon if forecast_horizon is provided.

encoded_cat_columnsList[str], optional

List of encoded categorical column names to be treated as static features.

scaling_paramsDict[str, Dict[str, float]], optional

Dictionary containing scaling parameters (mean and standard deviation) for features. Example: {'year': {'mean': 2000, 'std': 10}}. If not provided, the function computes the mean and std for the dt_col.

squeeze_last: bool, default=True,

Squeeze the last axis which correspond to the output dimension y if equal to 1.

verbosityint, optional

Verbosity level from 0 to 7 for debugging and understanding the process. Higher values produce more detailed logs.

Tuple[np.ndarray, np.ndarray, List[int], List[int], List[float], List[float]]

A tuple containing:

  • future_static_inputsnumpy.ndarray

    Array of future static inputs with shape (num_samples, num_static_vars, 1).

  • future_dynamic_inputsnumpy.ndarray

    Array of future dynamic inputs with shape (num_samples, sequence_length, num_dynamic_vars, 1).

  • future_years_listList[int]

    List of future time values corresponding to each sample.

  • location_ids_listList[int]

    List of location IDs corresponding to each sample.

  • longitudesList[float]

    List of longitude values corresponding to each sample.

  • latitudesList[float]

    List of latitude values corresponding to each sample.

>>> from fusionlab.nn.utils import prepare_spatial_future_data
>>> import pandas as pd
>>> data = pd.DataFrame({
...     'location_id': [1, 1, 1, 2, 2, 2],
...     'year': [2018, 2019, 2020, 2018, 2019, 2020],
...     'longitude': [10.0, 10.0, 10.0, 20.0, 20.0, 20.0],
...     'latitude': [50.0, 50.0, 50.0, 60.0, 60.0, 60.0],
...     'temperature': [15, 16, 15.5, 20, 21, 20.5],
...     'rainfall': [100, 110, 105, 200, 210, 205],
...     'encoded_cat': [1, 1, 1, 2, 2, 2]
... })
>>> feature_cols = ['year', 'temperature', 'rainfall', 'encoded_cat']
>>> dynamic_indices = [0, 1, 2]
>>> future_static, future_dynamic, future_years, loc_ids, longs,        lats = prepare_spatial_future_data(
...     final_processed_data=data,
...     feature_columns=feature_cols,
...     dynamic_feature_indices=dynamic_indices,
...     sequence_length=2,
...     forecast_horizon=1,
...     future_years=[2021],
...     encoded_cat_columns=['encoded_cat'],
...     verbosity=5,
...     dt_col='year'
... )
>>> print(future_static.shape)
(2, 3, 1)
>>> print(future_dynamic.shape)
(2, 2, 3, 1)
  • The function handles both integer and datetime representations of the dt_col. If dt_col is a datetime type, the year is extracted for scaling purposes.

  • If forecast_horizon is set to None, the function defaults to generating data for the next immediate time step based on the last entry in the time column.

  • Ensure that the length of future_years matches forecast_horizon if forecast_horizon is provided.

  • The static_feature_names parameter allows for flexibility in specifying which static features to include. If not provided, it defaults to ['longitude', 'latitude'] plus any encoded_cat_columns.

prepare_future_data : Main function for preparing future data inputs.

Parameters:
  • final_processed_data (DataFrame)

  • feature_columns (List[str])

  • dynamic_feature_indices (List[int])

  • sequence_length (int)

  • dt_col (str)

  • static_feature_names (List[str] | None)

  • forecast_horizon (int | None)

  • future_years (List[int] | None)

  • encoded_cat_columns (List[str] | None)

  • scaling_params (Dict[str, Dict[str, float]] | None)

  • spatial_cols (Tuple[str, str])

  • squeeze_last (bool)

  • verbosity (int)

Return type:

Tuple[ndarray, ndarray, List[int], List[int], List[float], List[float]]