Case Histories

This section showcases practical applications of the fusionlab library on real-world-inspired datasets, demonstrating common workflows from data loading and preprocessing to model training and forecasting.

These case studies focus on land subsidence prediction, using datasets from the Nansha and Zhongshan areas in China. They illustrate how to leverage fusionlab utilities and models like XTFT to handle complex spatio-temporal data with static, dynamic, and future features.

Note

The datasets used (nansha_2000.csv, zhongshan_2000.csv) are spatially sampled versions (approx. 2000 points each) derived from larger datasets used in research (e.g., [Liu24]). While suitable for demonstrating workflows, results on these smaller samples may differ from those obtained on the full datasets.

[Liu24] (1,2)

Liu, J., Liu, W., Allechy, F. B., Zheng, Z., Liu, R., & Kouadio, K. L. (2024). Machine learning-based techniques for land subsidence simulation in an urban area. Journal of Environmental Management, 352, 120078.


Nansha Land Subsidence Forecasting (Using XTFT)

This case study demonstrates a multi-step quantile forecasting workflow for land subsidence in the Nansha district using the XTFT model. Research indicates that factors like Groundwater Level (GWL) and Building Concentration (BC) are significant drivers in this region [Liu24]. We will follow a standard pipeline, assuming the goal is to predict future subsidence based on historical data and known future indicators.

(Note: This example uses the smaller sampled dataset `nansha_2000.csv` for demonstration purposes.)

Step 1: Imports and Setup

Import necessary libraries, including pandas, numpy, sklearn utilities, and relevant fusionlab functions for data loading, preprocessing, modeling, and loss calculation.

 1import numpy as np
 2import pandas as pd
 3import tensorflow as tf
 4import os
 5import joblib # For saving scalers/encoders
 6from sklearn.preprocessing import StandardScaler, OneHotEncoder
 7from sklearn.model_selection import train_test_split # For simple splitting later
 8
 9# Fusionlab imports
10from fusionlab.datasets.load import fetch_nansha_data # Specific loader
11from fusionlab.utils.ts_utils import reshape_xtft_data, to_dt
12try:
13    # Assume nan_ops handles missing values appropriately
14    from fusionlab.utils.preprocessing import nan_ops
15except ImportError:
16    def nan_ops(df, **kwargs): # Dummy if not found
17        print("Warning: nan_ops not found, filling NaNs with ffill/bfill.")
18        return df.ffill().bfill()
19from fusionlab.nn.transformers import XTFT
20from fusionlab.nn.losses import combined_quantile_loss
21
22# Suppress warnings and TF logs
23import warnings
24warnings.filterwarnings('ignore')
25tf.get_logger().setLevel('ERROR')
26tf.autograph.set_verbosity(0)
27
28# Configuration
29output_dir_nansha = "./nansha_case_study_output"
30os.makedirs(output_dir_nansha, exist_ok=True)
31print("Nansha Case Study: Setup complete.")

Step 2: Load Raw Nansha Data

Load the sampled Nansha dataset using the dedicated fetch function. We load it as a DataFrame for initial inspection and preprocessing.

 1# Load the nansha_2000.csv dataset
 2try:
 3    nansha_df_raw = fetch_nansha_data(
 4        as_frame=True,
 5        verbose=True,
 6        download_if_missing=True # Allow download if needed
 7    )
 8    print("\nNansha data loaded successfully.")
 9    nansha_df_raw.info()
10except Exception as e:
11    print(f"ERROR: Could not load Nansha data. Please ensure the file"
12          f" 'nansha_2000.csv' is available or downloadable. Error: {e}")
13    # Exit or raise further if data is essential
14    raise

Step 3: Initial Cleaning and Validation

Ensure the time column (‘year’ in this dataset) is treated correctly. Handle any immediate missing values using a defined strategy (e.g., filling).

 1# Assume 'year' is the primary time column based on data description
 2dt_col = 'year'
 3df_clean = nansha_df_raw.copy()
 4
 5# Although 'year' is integer, treat it as the time index marker later
 6# If actual datetime conversion needed:
 7# df_clean = to_dt(df_clean, dt_col=dt_col, ...)
 8
 9# Handle missing values (example: using nan_ops utility)
10df_clean = nan_ops(df_clean, ops='sanitize', action='fill')
11print("\nInitial NaN check passed (or NaNs filled).")

Step 4: Preprocessing (Encoding & Scaling)

Apply preprocessing steps necessary for the model. This typically involves encoding categorical features and scaling numerical features.

 1# 4a. Define Categorical and Numerical Columns for Nansha
 2# Based on user description: 'geology' is categorical
 3categorical_cols_n = ['geology']
 4# Identify remaining numerical columns (excluding target/coords/year)
 5numerical_cols_n = [
 6    'building_concentration', 'GWL', 'rainfall_mm',
 7    'normalized_seismic_risk_score', 'soil_thickness'
 8]
 9target_col_n = 'subsidence'
10coord_cols_n = ['longitude', 'latitude']
11
12# 4b. Encode Categorical Features
13df_processed = df_clean.copy()
14encoder_info_n = {'columns': {}, 'names': {}}
15encoded_cols_generated = []
16
17if categorical_cols_n:
18    print("\nEncoding categorical features...")
19    cols_to_keep_temp = df_processed.columns.difference(
20        categorical_cols_n).tolist()
21    df_encoded_list = [df_processed[cols_to_keep_temp]]
22    for col in categorical_cols_n:
23        if col in df_processed.columns:
24            encoder = OneHotEncoder(sparse_output=False,
25                                    handle_unknown='ignore',
26                                    dtype=np.float32)
27            encoded_data = encoder.fit_transform(df_processed[[col]])
28            new_cols = [f"{col}_{cat}" for cat in encoder.categories_[0]]
29            encoded_df = pd.DataFrame(encoded_data, columns=new_cols,
30                                      index=df_processed.index)
31            df_encoded_list.append(encoded_df)
32            encoder_info_n['columns'][col] = new_cols
33            encoder_info_n['names'][col] = encoder.categories_[0]
34            encoded_cols_generated.extend(new_cols) # Keep track
35            print(f"  Encoded '{col}' -> {len(new_cols)} columns")
36        else:
37            warnings.warn(f"Categorical column '{col}' not found.")
38    df_processed = pd.concat(df_encoded_list, axis=1)
39
40# 4c. Scale Numerical Features (including Target)
41scaler_n = StandardScaler() # Or MinMaxScaler()
42cols_to_scale_n = [c for c in numerical_cols_n if c in df_processed.columns]
43if target_col_n in df_processed.columns:
44    cols_to_scale_n.append(target_col_n)
45
46if cols_to_scale_n:
47    print(f"\nScaling numerical features: {cols_to_scale_n}...")
48    df_processed[cols_to_scale_n] = scaler_n.fit_transform(
49        df_processed[cols_to_scale_n]
50    )
51    # Save the scaler
52    scaler_n_path = os.path.join(output_dir_nansha, "nansha_scaler.joblib")
53    joblib.dump(scaler_n, scaler_n_path)
54    print(f"Scaler saved to {scaler_n_path}")
55else:
56    print("No numerical columns found to scale.")

Step 5: Define Feature Sets for Reshaping

Define the lists of column names corresponding to static, dynamic (past), and future known features after the preprocessing steps (encoding, scaling). This requires mapping original concepts to the potentially new column names.

 1# Define based on available columns in df_processed
 2# Static: Coords + Encoded Categoricals + Base numericals?
 3final_static_cols_n = list(coord_cols_n)
 4final_static_cols_n.extend(encoded_cols_generated) # Add encoded geology
 5# Add original static numericals if applicable (e.g., maybe soil_thickness varies slowly)
 6# final_static_cols_n.append('soil_thickness') # Example if static
 7
 8# Dynamic: Non-static numericals + Time varying categoricals (if encoded differently)
 9# Here: GWL, rainfall_mm, building_concentration, norm_seismic_score?
10# Assuming 'year' is the time index dt_col
11final_dynamic_cols_n = [
12    'GWL', 'rainfall_mm', 'building_concentration',
13    'normalized_seismic_risk_score', 'soil_thickness' # Assume dynamic for now
14    ]
15final_dynamic_cols_n = [c for c in final_dynamic_cols_n if c in df_processed.columns]
16
17
18# Future: Known future events or time features
19# Here: rainfall_mm (assuming known forecast), year?
20final_future_cols_n = ['rainfall_mm'] # Example, needs domain knowledge
21final_future_cols_n = [c for c in final_future_cols_n if c in df_processed.columns]
22
23# Ensure all columns exist
24print("\nFinal Feature Sets for Reshaping:")
25print("  Static:", final_static_cols_n)
26print("  Dynamic:", final_dynamic_cols_n)
27print("  Future:", final_future_cols_n)
28print("  Target:", target_col_n)
29print("  DateTime:", dt_col)
30print("  Spatial:", spatial_cols)

Step 6: Generate Sequences

Use reshape_xtft_data() to create the final sequence arrays needed by the XTFT model.

 1time_steps = 4          # Example lookback (needs tuning)
 2forecast_horizon = 4    # Example horizon (e.g., predict 4 years)
 3
 4print(f"\nReshaping data (T={time_steps}, H={forecast_horizon})...")
 5static_data, dynamic_data, future_data, target_data = reshape_xtft_data(
 6    df=df_processed,
 7    dt_col=dt_col, # 'year' acts as time index here
 8    target_col=target_col_n,
 9    static_cols=final_static_cols_n,
10    dynamic_cols=final_dynamic_cols_n,
11    future_cols=final_future_cols_n,
12    spatial_cols=spatial_cols, # ['longitude', 'latitude']
13    time_steps=time_steps,
14    forecast_horizons=forecast_horizon,
15    verbose=1
16)

Step 7: Data Splitting for Training

Split the generated sequence arrays into training and validation sets. A simple chronological split based on the sequence order is shown.

 1# Split sequences (example: 80% train, 20% validation)
 2val_split_frac = 0.2
 3n_seq_samples = static_data.shape[0]
 4split_seq_idx = int(n_seq_samples * (1 - val_split_frac))
 5
 6X_train_static, X_val_static = static_data[:split_seq_idx], static_data[split_seq_idx:]
 7X_train_dynamic, X_val_dynamic = dynamic_data[:split_seq_idx], dynamic_data[split_seq_idx:]
 8X_train_future, X_val_future = future_data[:split_seq_idx], future_data[split_seq_idx:]
 9y_train, y_val = target_data[:split_seq_idx], target_data[split_seq_idx:]
10
11# Package inputs into lists
12train_inputs = [X_train_static, X_train_dynamic, X_train_future]
13val_inputs = [X_val_static, X_val_dynamic, X_val_future]
14
15print("\nSequence data split into Train/Validation sets.")
16print(f"  Train sequences: {len(y_train)}")
17print(f"  Validation sequences: {len(y_val)}")

Step 8: Define and Train XTFT Model

Instantiate the XTFT model, configure it for quantile forecasting, compile it with the appropriate loss function, and train it on the prepared sequence data.

 1quantiles = [0.1, 0.5, 0.9] # Define target quantiles
 2output_dim = 1             # Predicting single target 'subsidence'
 3
 4# Instantiate XTFT
 5xtft_model = XTFT(
 6    static_input_dim=static_data.shape[-1],
 7    dynamic_input_dim=dynamic_data.shape[-1],
 8    future_input_dim=future_data.shape[-1],
 9    forecast_horizon=forecast_horizon,
10    quantiles=quantiles,
11    output_dim=output_dim,
12    # --- Example Hyperparameters (Tune these) ---
13    hidden_units=32,
14    embed_dim=16,
15    attention_units=16,
16    lstm_units=32,
17    num_heads=4,
18    max_window_size=time_steps, # Typically match lookback
19    memory_size=50,
20    dropout_rate=0.1,
21    # --- Anomaly Detection (Optional) ---
22    # anomaly_detection_strategy=None, # Example: No anomaly det.
23)
24print("\nXTFT model instantiated.")
25
26# Compile with quantile loss
27loss_fn = combined_quantile_loss(quantiles=quantiles)
28xtft_model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.003),
29                   loss=loss_fn)
30print("Model compiled.")
31
32# Train the model
33print("Starting model training...")
34history = xtft_model.fit(
35    train_inputs,
36    y_train,
37    validation_data=(val_inputs, y_val),
38    epochs=10, # Increase significantly for real training
39    batch_size=32,
40    verbose=1
41)
42print("Training finished.")

Step 9: Forecasting and Evaluation

Use the trained model to make predictions on the validation set (or new data). Inverse-transform the scaled predictions and evaluate performance using relevant metrics (e.g., quantile loss, R², coverage).

 1print("\nGenerating predictions on validation set...")
 2predictions_scaled = xtft_model.predict(val_inputs)
 3
 4# Inverse transform predictions
 5# (Requires loading/using the 'scaler' saved in Step 4c)
 6# loaded_scaler = joblib.load(scaler_n_path)
 7scaler_target = scalers[target_col_n] # Use scaler from memory
 8
 9num_val_samples = X_val_static.shape[0]
10pred_reshaped = predictions_scaled.reshape(-1, len(quantiles))
11predictions_inv = scaler_target.inverse_transform(pred_reshaped)
12predictions_final = predictions_inv.reshape(
13    num_val_samples, forecast_horizon, len(quantiles)
14    )
15
16# Inverse transform actual validation targets
17y_val_reshaped = y_val.reshape(-1, output_dim)
18y_val_inv = scaler_target.inverse_transform(y_val_reshaped)
19y_val_final = y_val_inv.reshape(
20    num_val_samples, forecast_horizon, output_dim
21    )
22
23print("Predictions inverse transformed.")
24print("Sample inverse predictions (median, q=0.5):")
25print(predictions_final[:2, :, 1]) # Show median for first 2 samples
26
27# Evaluation (Example: R2 score on median forecast)
28# Note: Proper evaluation uses appropriate metrics like pinball loss
29from sklearn.metrics import r2_score
30# Flatten horizon for basic R2
31r2 = r2_score(y_val_final.flatten(), predictions_final[:, :, 1].flatten())
32print(f"\nApprox R2 Score (Median vs Actual): {r2:.4f}")

Step 10: Visualization (Optional)

Visualize the quantile forecasts against actual values for specific items or time periods using matplotlib or visualize_forecasts().

1# (Visualization code would go here, similar to previous examples,
2#  plotting y_val_final against columns of predictions_final)
3print("\nVisualization step placeholder.")

Zhongshan Land Subsidence Forecasting (Using XTFT)

This case study demonstrates a multi-step quantile forecasting workflow for land subsidence in Zhongshan, China, using the XTFT model. Zhongshan, located in the Pearl River Delta, experiences significant land subsidence challenges due to urbanization and groundwater extraction, making it a relevant area for advanced forecasting techniques [Liu25NS].

We will follow a pipeline that includes: 1. Loading and preprocessing the Zhongshan dataset using a dedicated utility. 2. Generating sequences suitable for XTFT. 3. Splitting data for training and validation. 4. Defining, compiling, and training the XTFT model for quantile output. 5. Generating out-of-sample forecasts using a helper function. 6. Visualizing the spatial forecast results.

(Note: This example uses the smaller sampled dataset `zhongshan_2000.csv` for demonstration purposes.)

[Liu25NS]

Liu, R., Kouadio, K. L., et al. (2025). Forecasting Urban Land Subsidence in the Era of Rapid Urbanization and Climate Stress. Nature Sustainability (Submitted).

Step 1: Imports and Setup

Import necessary libraries and the specific fusionlab functions for data loading/processing (load_processed_subsidence_data), modeling (XTFT), loss (combined_quantile_loss), forecasting (generate_forecast), and visualization (visualize_forecasts).

 1import numpy as np
 2import pandas as pd
 3import tensorflow as tf
 4import matplotlib.pyplot as plt
 5import os
 6import joblib # For loading scaler if saved by processing func
 7
 8# Fusionlab imports
 9from fusionlab.datasets.load import load_processed_subsidence_data
10from fusionlab.nn.transformers import XTFT
11from fusionlab.nn.utils import (
12    reshape_xtft_data, # Used internally by load_processed if needed
13    generate_forecast,
14    visualize_forecasts
15)
16from fusionlab.nn.losses import combined_quantile_loss
17from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
18
19# Suppress warnings and TF logs
20import warnings
21warnings.filterwarnings('ignore')
22tf.get_logger().setLevel('ERROR')
23tf.autograph.set_verbosity(0)
24
25# Configuration
26output_dir_zhongshan = "./zhongshan_case_study_output"
27os.makedirs(output_dir_zhongshan, exist_ok=True)
28print("Zhongshan Case Study: Setup complete.")

Step 2: Load and Preprocess Data

We use the load_processed_subsidence_data() function. This helper encapsulates loading the raw zhongshan_2000.csv data, applying the preprocessing steps (feature selection, NaN filling, OneHotEncoding for ‘geology’ and ‘density_tier’, MinMaxScaler), and returns the processed DataFrame. We disable sequence generation at this stage (return_sequences=False).

 1print("Loading and preprocessing Zhongshan data...")
 2try:
 3    # Use the loader to get the processed frame
 4    # It handles raw data fetching, feature selection, nan ops,
 5    # encoding, and scaling internally based on defaults for zhongshan
 6    df_processed = load_processed_subsidence_data(
 7        dataset_name='zhongshan',
 8        return_sequences=False, # We want the processed DataFrame first
 9        as_frame=True,
10        scaler_type='minmax', # Match paper example script
11        use_processed_cache=True, # Use cache if available
12        save_processed_frame=True, # Save if reprocessed
13        cache_suffix="_paper_proc", # Suffix for this specific processing
14        verbose=True
15    )
16    print("\nZhongshan data loaded and processed.")
17    df_processed.info()
18
19except FileNotFoundError:
20    print("\nERROR: Raw data file 'zhongshan_2000.csv' not found and"
21          " could not be downloaded. Cannot proceed.")
22    # Handle error appropriately in a real script
23    raise
24except Exception as e:
25    print(f"\nERROR during data loading/processing: {e}")
26    raise

Step 3: Define Feature Sets for Sequencing

Based on the columns present after preprocessing (including the new one-hot encoded columns), we define the lists required by reshape_xtft_data().

 1target_col = 'subsidence'
 2dt_col = 'year' # Time index column
 3spatial_cols = ['longitude', 'latitude'] # Grouping and also static
 4
 5# Identify encoded columns automatically (example)
 6encoded_cols = [c for c in df_processed.columns if
 7                c.startswith('geology_') or c.startswith('density_tier_')]
 8
 9# Define final feature sets
10final_static_cols = list(spatial_cols) + encoded_cols
11# Numerical columns from paper (excluding target, year) after scaling
12final_dynamic_cols = [
13    'GWL', 'rainfall_mm', 'normalized_density',
14    'normalized_seismic_risk_score'
15    ]
16# Future columns from paper
17final_future_cols = ['rainfall_mm']
18
19# Verify all defined columns exist in the processed dataframe
20all_needed_cols = (
21     [dt_col, target_col] + spatial_cols + final_static_cols +
22     final_dynamic_cols + final_future_cols
23)
24missing_cols = [c for c in set(all_needed_cols) if c not in df_processed.columns]
25if missing_cols:
26    raise ValueError(f"Columns required for sequencing missing from"
27                     f" processed data: {missing_cols}")
28
29print("\nFeature sets defined for sequence reshaping.")
30print(f"  Static : {final_static_cols}")
31print(f"  Dynamic: {final_dynamic_cols}")
32print(f"  Future : {final_future_cols}")

Step 4: Generate Sequences

Now, use reshape_xtft_data() with the processed DataFrame and defined column lists to create the sequence arrays. We use time_steps=4 and forecast_horizon=4 based on the paper’s example script configuration.

 1time_steps = 4          # Lookback window from script
 2forecast_horizon = 4    # Prediction horizon from script (2023-2026)
 3
 4print(f"\nReshaping data (T={time_steps}, H={forecast_horizon})...")
 5static_data, dynamic_data, future_data, target_data = reshape_xtft_data(
 6    df=df_processed,
 7    dt_col=dt_col,
 8    target_col=target_col,
 9    static_cols=final_static_cols,
10    dynamic_cols=final_dynamic_cols,
11    future_cols=final_future_cols,
12    spatial_cols=spatial_cols,
13    time_steps=time_steps,
14    forecast_horizons=forecast_horizon,
15    verbose=1
16)

Step 5: Data Splitting for Training

Split the generated sequences into training and validation sets. The paper used 2015-2022 for training and 2023 for testing. Here, we split the generated sequences (which implicitly capture this time split if reshape_xtft_data processed chronologically per group) using a standard ratio (e.g., 80/20).

 1# Split sequences (example: 80% train, 20% validation)
 2X_static_train, X_static_val, \
 3X_dynamic_train, X_dynamic_val, \
 4X_future_train, X_future_val, \
 5y_train, y_val = train_test_split(
 6    static_data, dynamic_data, future_data, target_data,
 7    test_size=0.2, # 20% for validation
 8    random_state=42 # For reproducible split
 9)
10
11# Package inputs into lists
12train_inputs = [X_static_train, X_dynamic_train, X_train_future]
13val_inputs = [X_static_val, X_dynamic_val, X_val_future]
14
15print("\nSequence data split into Train/Validation sets:")
16print(f"  Train sequences: {len(y_train)}")
17print(f"  Validation sequences: {len(y_val)}")

Step 6: Define and Train XTFT Model

Instantiate the XTFT model (note: the script used SuperXTFT, but we use XTFT as requested). Configure it for quantile forecasting using hyperparameters potentially derived from tuning or the paper’s example. Compile with quantile loss and train using .fit().

 1quantiles = [0.1, 0.5, 0.9]
 2output_dim = 1 # Predicting 'subsidence'
 3
 4# Example Hyperparameters (adjust based on tuning/paper)
 5best_params = {
 6    'embed_dim': 32, 'max_window_size': time_steps, 'memory_size': 100,
 7    'num_heads': 4, 'dropout_rate': 0.1, 'lstm_units': 64,
 8    'attention_units': 64, 'hidden_units': 32, 'multi_scale_agg': 'auto',
 9}
10
11# Instantiate XTFT model
12xtft_model = XTFT(
13    static_input_dim=static_data.shape[-1],
14    dynamic_input_dim=dynamic_data.shape[-1],
15    future_input_dim=future_data.shape[-1],
16    forecast_horizon=forecast_horizon,
17    quantiles=quantiles,
18    output_dim=output_dim,
19    **best_params
20)
21print("\nXTFT model instantiated.")
22
23# Compile with quantile loss
24loss_fn = combined_quantile_loss(quantiles=quantiles)
25xtft_model.compile(
26    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
27    loss=loss_fn
28)
29print("Model compiled.")
30
31# Define Callbacks
32early_stopping = EarlyStopping(monitor='val_loss', patience=5,
33                               restore_best_weights=True)
34model_checkpoint_path = os.path.join(
35    output_dir_zhongshan, 'xtft_zhongshan_best.keras' # Use .keras format
36    )
37model_checkpoint = ModelCheckpoint(
38    model_checkpoint_path, monitor='val_loss', save_best_only=True,
39    save_weights_only=False, verbose=1
40)
41
42# Train the model
43print("Starting model training...")
44history = xtft_model.fit(
45    train_inputs,
46    y_train,
47    validation_data=(val_inputs, y_val),
48    epochs=5, # Increase significantly for real training (e.g., 50)
49    batch_size=32,
50    callbacks=[early_stopping, model_checkpoint],
51    verbose=1
52)
53print("Training finished.")
54
55# Optional: Load the best model saved by checkpoint
56# try:
57#     print("Loading best model from checkpoint...")
58#     best_xtft_model = tf.keras.models.load_model(
59#         model_checkpoint_path,
60#         custom_objects={"combined_quantile_loss": loss_fn} # Pass loss
61#     )
62# except Exception as e:
63#     print(f"Could not load checkpoint, using last model state. Error: {e}")
64#     best_xtft_model = xtft_model # Fallback to last state

Step 7: Generate Forecasts

Use the trained model and the generate_forecast() utility to generate predictions for future years (e.g., 2023-2026, matching the forecast_horizon). This function handles preparing the necessary input sequences from the end of the training data. We also provide the hold-out test data (actual 2023 data) for evaluation within the function.

 1# Need the processed DataFrame BEFORE sequencing for generate_forecast
 2# Also need the original unprocessed test data for evaluation comparison
 3# Re-load/split original data if not kept from Step 2 of the script
 4# For simplicity, assume df_processed contains data up to 2022
 5# and we have a separate test_df_raw for 2023 actuals.
 6
 7# Placeholder for actual 2023 test data (load this properly)
 8# test_data = df_raw[df_raw['year'] == 2023].copy()
 9# Ensure test_data has same columns as needed for eval inside generate_forecast
10# For demonstration, we'll use the validation set derived from sequences
11# NOTE: generate_forecast expects the *training* data to find the last sequence
12# and test data for evaluation. Need df_processed up to end of training.
13df_processed_train = df_processed[df_processed[dt_col] <= 2022] # Example filter
14
15print("\nGenerating forecast using generate_forecast...")
16# Define the forecast years explicitly
17forecast_years = [2023, 2024, 2025, 2026] # Matches horizon=4
18
19# Use generate_forecast
20forecast_df = generate_forecast(
21    xtft_model=xtft_model, # Use the trained model
22    train_data=df_processed_train, # Data model was trained on (for last seq)
23    dt_col=dt_col,
24    time_steps=time_steps,
25    forecast_horizon=forecast_horizon,
26    static_features=final_static_cols,
27    dynamic_features=final_dynamic_cols,
28    future_features=final_future_cols,
29    spatial_cols=spatial_cols,
30    # test_data=test_data, # Provide actual 2023 data for evaluation
31    mode="quantile",
32    q=quantiles, # Pass the quantiles used
33    tname=target_col,
34    forecast_dt=forecast_years, # Specify exact output years
35    scaler=scaler_n, # Pass the scaler used for numerical cols
36    num_cols=cols_to_scale_n, # List of cols that were scaled
37    savefile=os.path.join(output_dir_zhongshan, "zhongshan_forecast.csv"),
38    verbose=1
39)
40
41print("\nForecast DataFrame head:")
42print(forecast_df.head())

Step 8: Visualize Forecasts

Use the visualize_forecasts() utility to plot the spatial distribution of actual (if available) and predicted subsidence for specific forecast periods (years).

 1# Need actual test data (e.g., for 2023) for comparison plots
 2# Assuming test_data DataFrame for 2023 exists from script's Step 6
 3# test_data = df_raw[df_raw['year'] == 2023] # Example
 4
 5print("\nGenerating forecast visualization...")
 6try:
 7    # Visualize forecast for 2023 (requires actual 2023 data in test_data)
 8    # Create dummy test_data if needed for visualization code to run
 9    if 'test_data' not in locals():
10         warnings.warn("Actual test_data for 2023 not available,"
11                       " visualization will only show predictions.")
12         test_data_vis = None
13    else:
14        test_data_vis = test_data
15
16    visualize_forecasts(
17        forecast_df=forecast_df,
18        test_data=test_data_vis, # Provide actuals if available
19        dt_col=dt_col,
20        tname=target_col,
21        eval_periods=[2023], # Evaluate/plot only 2023
22        mode="quantile", # Match forecast mode
23        kind="spatial",
24        x="longitude", # Coordinate columns
25        y="latitude",
26        verbose=1
27    )
28except Exception as e:
29    print(f"Visualization failed. Error: {e}")