Case Histories¶
This section showcases practical applications of the fusionlab
library on real-world-inspired datasets, demonstrating common
workflows from data loading and preprocessing to model training and
forecasting.
These case studies focus on land subsidence prediction, using datasets
from the Nansha and Zhongshan areas in China. They illustrate how
to leverage fusionlab utilities and models like
XTFT to handle complex spatio-temporal data
with static, dynamic, and future features.
Note
The datasets used (nansha_2000.csv, zhongshan_2000.csv) are spatially sampled versions (approx. 2000 points each) derived from larger datasets used in research (e.g., [Liu24]). While suitable for demonstrating workflows, results on these smaller samples may differ from those obtained on the full datasets.
Nansha Land Subsidence Forecasting (Using XTFT)¶
This case study demonstrates a multi-step quantile forecasting workflow
for land subsidence in the Nansha district using the
XTFT model. Research indicates that factors like
Groundwater Level (GWL) and Building Concentration (BC) are significant
drivers in this region [Liu24]. We will follow a standard pipeline,
assuming the goal is to predict future subsidence based on historical
data and known future indicators.
(Note: This example uses the smaller sampled dataset `nansha_2000.csv` for demonstration purposes.)
Step 1: Imports and Setup¶
Import necessary libraries, including pandas, numpy, sklearn utilities, and relevant fusionlab functions for data loading, preprocessing, modeling, and loss calculation.
1import numpy as np
2import pandas as pd
3import tensorflow as tf
4import os
5import joblib # For saving scalers/encoders
6from sklearn.preprocessing import StandardScaler, OneHotEncoder
7from sklearn.model_selection import train_test_split # For simple splitting later
8
9# Fusionlab imports
10from fusionlab.datasets.load import fetch_nansha_data # Specific loader
11from fusionlab.utils.ts_utils import reshape_xtft_data, to_dt
12try:
13 # Assume nan_ops handles missing values appropriately
14 from fusionlab.utils.preprocessing import nan_ops
15except ImportError:
16 def nan_ops(df, **kwargs): # Dummy if not found
17 print("Warning: nan_ops not found, filling NaNs with ffill/bfill.")
18 return df.ffill().bfill()
19from fusionlab.nn.transformers import XTFT
20from fusionlab.nn.losses import combined_quantile_loss
21
22# Suppress warnings and TF logs
23import warnings
24warnings.filterwarnings('ignore')
25tf.get_logger().setLevel('ERROR')
26tf.autograph.set_verbosity(0)
27
28# Configuration
29output_dir_nansha = "./nansha_case_study_output"
30os.makedirs(output_dir_nansha, exist_ok=True)
31print("Nansha Case Study: Setup complete.")
Step 2: Load Raw Nansha Data¶
Load the sampled Nansha dataset using the dedicated fetch function. We load it as a DataFrame for initial inspection and preprocessing.
1# Load the nansha_2000.csv dataset
2try:
3 nansha_df_raw = fetch_nansha_data(
4 as_frame=True,
5 verbose=True,
6 download_if_missing=True # Allow download if needed
7 )
8 print("\nNansha data loaded successfully.")
9 nansha_df_raw.info()
10except Exception as e:
11 print(f"ERROR: Could not load Nansha data. Please ensure the file"
12 f" 'nansha_2000.csv' is available or downloadable. Error: {e}")
13 # Exit or raise further if data is essential
14 raise
Step 3: Initial Cleaning and Validation¶
Ensure the time column (‘year’ in this dataset) is treated correctly. Handle any immediate missing values using a defined strategy (e.g., filling).
1# Assume 'year' is the primary time column based on data description
2dt_col = 'year'
3df_clean = nansha_df_raw.copy()
4
5# Although 'year' is integer, treat it as the time index marker later
6# If actual datetime conversion needed:
7# df_clean = to_dt(df_clean, dt_col=dt_col, ...)
8
9# Handle missing values (example: using nan_ops utility)
10df_clean = nan_ops(df_clean, ops='sanitize', action='fill')
11print("\nInitial NaN check passed (or NaNs filled).")
Step 4: Preprocessing (Encoding & Scaling)¶
Apply preprocessing steps necessary for the model. This typically involves encoding categorical features and scaling numerical features.
1# 4a. Define Categorical and Numerical Columns for Nansha
2# Based on user description: 'geology' is categorical
3categorical_cols_n = ['geology']
4# Identify remaining numerical columns (excluding target/coords/year)
5numerical_cols_n = [
6 'building_concentration', 'GWL', 'rainfall_mm',
7 'normalized_seismic_risk_score', 'soil_thickness'
8]
9target_col_n = 'subsidence'
10coord_cols_n = ['longitude', 'latitude']
11
12# 4b. Encode Categorical Features
13df_processed = df_clean.copy()
14encoder_info_n = {'columns': {}, 'names': {}}
15encoded_cols_generated = []
16
17if categorical_cols_n:
18 print("\nEncoding categorical features...")
19 cols_to_keep_temp = df_processed.columns.difference(
20 categorical_cols_n).tolist()
21 df_encoded_list = [df_processed[cols_to_keep_temp]]
22 for col in categorical_cols_n:
23 if col in df_processed.columns:
24 encoder = OneHotEncoder(sparse_output=False,
25 handle_unknown='ignore',
26 dtype=np.float32)
27 encoded_data = encoder.fit_transform(df_processed[[col]])
28 new_cols = [f"{col}_{cat}" for cat in encoder.categories_[0]]
29 encoded_df = pd.DataFrame(encoded_data, columns=new_cols,
30 index=df_processed.index)
31 df_encoded_list.append(encoded_df)
32 encoder_info_n['columns'][col] = new_cols
33 encoder_info_n['names'][col] = encoder.categories_[0]
34 encoded_cols_generated.extend(new_cols) # Keep track
35 print(f" Encoded '{col}' -> {len(new_cols)} columns")
36 else:
37 warnings.warn(f"Categorical column '{col}' not found.")
38 df_processed = pd.concat(df_encoded_list, axis=1)
39
40# 4c. Scale Numerical Features (including Target)
41scaler_n = StandardScaler() # Or MinMaxScaler()
42cols_to_scale_n = [c for c in numerical_cols_n if c in df_processed.columns]
43if target_col_n in df_processed.columns:
44 cols_to_scale_n.append(target_col_n)
45
46if cols_to_scale_n:
47 print(f"\nScaling numerical features: {cols_to_scale_n}...")
48 df_processed[cols_to_scale_n] = scaler_n.fit_transform(
49 df_processed[cols_to_scale_n]
50 )
51 # Save the scaler
52 scaler_n_path = os.path.join(output_dir_nansha, "nansha_scaler.joblib")
53 joblib.dump(scaler_n, scaler_n_path)
54 print(f"Scaler saved to {scaler_n_path}")
55else:
56 print("No numerical columns found to scale.")
Step 5: Define Feature Sets for Reshaping¶
Define the lists of column names corresponding to static, dynamic (past), and future known features after the preprocessing steps (encoding, scaling). This requires mapping original concepts to the potentially new column names.
1# Define based on available columns in df_processed
2# Static: Coords + Encoded Categoricals + Base numericals?
3final_static_cols_n = list(coord_cols_n)
4final_static_cols_n.extend(encoded_cols_generated) # Add encoded geology
5# Add original static numericals if applicable (e.g., maybe soil_thickness varies slowly)
6# final_static_cols_n.append('soil_thickness') # Example if static
7
8# Dynamic: Non-static numericals + Time varying categoricals (if encoded differently)
9# Here: GWL, rainfall_mm, building_concentration, norm_seismic_score?
10# Assuming 'year' is the time index dt_col
11final_dynamic_cols_n = [
12 'GWL', 'rainfall_mm', 'building_concentration',
13 'normalized_seismic_risk_score', 'soil_thickness' # Assume dynamic for now
14 ]
15final_dynamic_cols_n = [c for c in final_dynamic_cols_n if c in df_processed.columns]
16
17
18# Future: Known future events or time features
19# Here: rainfall_mm (assuming known forecast), year?
20final_future_cols_n = ['rainfall_mm'] # Example, needs domain knowledge
21final_future_cols_n = [c for c in final_future_cols_n if c in df_processed.columns]
22
23# Ensure all columns exist
24print("\nFinal Feature Sets for Reshaping:")
25print(" Static:", final_static_cols_n)
26print(" Dynamic:", final_dynamic_cols_n)
27print(" Future:", final_future_cols_n)
28print(" Target:", target_col_n)
29print(" DateTime:", dt_col)
30print(" Spatial:", spatial_cols)
Step 6: Generate Sequences¶
Use reshape_xtft_data() to create the final
sequence arrays needed by the XTFT model.
1time_steps = 4 # Example lookback (needs tuning)
2forecast_horizon = 4 # Example horizon (e.g., predict 4 years)
3
4print(f"\nReshaping data (T={time_steps}, H={forecast_horizon})...")
5static_data, dynamic_data, future_data, target_data = reshape_xtft_data(
6 df=df_processed,
7 dt_col=dt_col, # 'year' acts as time index here
8 target_col=target_col_n,
9 static_cols=final_static_cols_n,
10 dynamic_cols=final_dynamic_cols_n,
11 future_cols=final_future_cols_n,
12 spatial_cols=spatial_cols, # ['longitude', 'latitude']
13 time_steps=time_steps,
14 forecast_horizons=forecast_horizon,
15 verbose=1
16)
Step 7: Data Splitting for Training¶
Split the generated sequence arrays into training and validation sets. A simple chronological split based on the sequence order is shown.
1# Split sequences (example: 80% train, 20% validation)
2val_split_frac = 0.2
3n_seq_samples = static_data.shape[0]
4split_seq_idx = int(n_seq_samples * (1 - val_split_frac))
5
6X_train_static, X_val_static = static_data[:split_seq_idx], static_data[split_seq_idx:]
7X_train_dynamic, X_val_dynamic = dynamic_data[:split_seq_idx], dynamic_data[split_seq_idx:]
8X_train_future, X_val_future = future_data[:split_seq_idx], future_data[split_seq_idx:]
9y_train, y_val = target_data[:split_seq_idx], target_data[split_seq_idx:]
10
11# Package inputs into lists
12train_inputs = [X_train_static, X_train_dynamic, X_train_future]
13val_inputs = [X_val_static, X_val_dynamic, X_val_future]
14
15print("\nSequence data split into Train/Validation sets.")
16print(f" Train sequences: {len(y_train)}")
17print(f" Validation sequences: {len(y_val)}")
Step 8: Define and Train XTFT Model¶
Instantiate the XTFT model, configure it for quantile forecasting, compile it with the appropriate loss function, and train it on the prepared sequence data.
1quantiles = [0.1, 0.5, 0.9] # Define target quantiles
2output_dim = 1 # Predicting single target 'subsidence'
3
4# Instantiate XTFT
5xtft_model = XTFT(
6 static_input_dim=static_data.shape[-1],
7 dynamic_input_dim=dynamic_data.shape[-1],
8 future_input_dim=future_data.shape[-1],
9 forecast_horizon=forecast_horizon,
10 quantiles=quantiles,
11 output_dim=output_dim,
12 # --- Example Hyperparameters (Tune these) ---
13 hidden_units=32,
14 embed_dim=16,
15 attention_units=16,
16 lstm_units=32,
17 num_heads=4,
18 max_window_size=time_steps, # Typically match lookback
19 memory_size=50,
20 dropout_rate=0.1,
21 # --- Anomaly Detection (Optional) ---
22 # anomaly_detection_strategy=None, # Example: No anomaly det.
23)
24print("\nXTFT model instantiated.")
25
26# Compile with quantile loss
27loss_fn = combined_quantile_loss(quantiles=quantiles)
28xtft_model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.003),
29 loss=loss_fn)
30print("Model compiled.")
31
32# Train the model
33print("Starting model training...")
34history = xtft_model.fit(
35 train_inputs,
36 y_train,
37 validation_data=(val_inputs, y_val),
38 epochs=10, # Increase significantly for real training
39 batch_size=32,
40 verbose=1
41)
42print("Training finished.")
Step 9: Forecasting and Evaluation¶
Use the trained model to make predictions on the validation set (or new data). Inverse-transform the scaled predictions and evaluate performance using relevant metrics (e.g., quantile loss, R², coverage).
1print("\nGenerating predictions on validation set...")
2predictions_scaled = xtft_model.predict(val_inputs)
3
4# Inverse transform predictions
5# (Requires loading/using the 'scaler' saved in Step 4c)
6# loaded_scaler = joblib.load(scaler_n_path)
7scaler_target = scalers[target_col_n] # Use scaler from memory
8
9num_val_samples = X_val_static.shape[0]
10pred_reshaped = predictions_scaled.reshape(-1, len(quantiles))
11predictions_inv = scaler_target.inverse_transform(pred_reshaped)
12predictions_final = predictions_inv.reshape(
13 num_val_samples, forecast_horizon, len(quantiles)
14 )
15
16# Inverse transform actual validation targets
17y_val_reshaped = y_val.reshape(-1, output_dim)
18y_val_inv = scaler_target.inverse_transform(y_val_reshaped)
19y_val_final = y_val_inv.reshape(
20 num_val_samples, forecast_horizon, output_dim
21 )
22
23print("Predictions inverse transformed.")
24print("Sample inverse predictions (median, q=0.5):")
25print(predictions_final[:2, :, 1]) # Show median for first 2 samples
26
27# Evaluation (Example: R2 score on median forecast)
28# Note: Proper evaluation uses appropriate metrics like pinball loss
29from sklearn.metrics import r2_score
30# Flatten horizon for basic R2
31r2 = r2_score(y_val_final.flatten(), predictions_final[:, :, 1].flatten())
32print(f"\nApprox R2 Score (Median vs Actual): {r2:.4f}")
Step 10: Visualization (Optional)¶
Visualize the quantile forecasts against actual values for specific
items or time periods using matplotlib or visualize_forecasts().
1# (Visualization code would go here, similar to previous examples,
2# plotting y_val_final against columns of predictions_final)
3print("\nVisualization step placeholder.")
Zhongshan Land Subsidence Forecasting (Using XTFT)¶
This case study demonstrates a multi-step quantile forecasting workflow
for land subsidence in Zhongshan, China, using the
XTFT model. Zhongshan, located in the Pearl
River Delta, experiences significant land subsidence challenges due to
urbanization and groundwater extraction, making it a relevant area for
advanced forecasting techniques [Liu25NS].
We will follow a pipeline that includes: 1. Loading and preprocessing the Zhongshan dataset using a dedicated utility. 2. Generating sequences suitable for XTFT. 3. Splitting data for training and validation. 4. Defining, compiling, and training the XTFT model for quantile output. 5. Generating out-of-sample forecasts using a helper function. 6. Visualizing the spatial forecast results.
(Note: This example uses the smaller sampled dataset `zhongshan_2000.csv` for demonstration purposes.)
Liu, R., Kouadio, K. L., et al. (2025). Forecasting Urban Land Subsidence in the Era of Rapid Urbanization and Climate Stress. Nature Sustainability (Submitted).
Step 1: Imports and Setup¶
Import necessary libraries and the specific fusionlab functions for
data loading/processing (load_processed_subsidence_data), modeling
(XTFT), loss (combined_quantile_loss), forecasting
(generate_forecast), and visualization (visualize_forecasts).
1import numpy as np
2import pandas as pd
3import tensorflow as tf
4import matplotlib.pyplot as plt
5import os
6import joblib # For loading scaler if saved by processing func
7
8# Fusionlab imports
9from fusionlab.datasets.load import load_processed_subsidence_data
10from fusionlab.nn.transformers import XTFT
11from fusionlab.nn.utils import (
12 reshape_xtft_data, # Used internally by load_processed if needed
13 generate_forecast,
14 visualize_forecasts
15)
16from fusionlab.nn.losses import combined_quantile_loss
17from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
18
19# Suppress warnings and TF logs
20import warnings
21warnings.filterwarnings('ignore')
22tf.get_logger().setLevel('ERROR')
23tf.autograph.set_verbosity(0)
24
25# Configuration
26output_dir_zhongshan = "./zhongshan_case_study_output"
27os.makedirs(output_dir_zhongshan, exist_ok=True)
28print("Zhongshan Case Study: Setup complete.")
Step 2: Load and Preprocess Data¶
We use the load_processed_subsidence_data()
function. This helper encapsulates loading the raw zhongshan_2000.csv
data, applying the preprocessing steps (feature selection, NaN filling,
OneHotEncoding for ‘geology’ and ‘density_tier’, MinMaxScaler), and
returns the processed DataFrame. We disable sequence generation at this
stage (return_sequences=False).
1print("Loading and preprocessing Zhongshan data...")
2try:
3 # Use the loader to get the processed frame
4 # It handles raw data fetching, feature selection, nan ops,
5 # encoding, and scaling internally based on defaults for zhongshan
6 df_processed = load_processed_subsidence_data(
7 dataset_name='zhongshan',
8 return_sequences=False, # We want the processed DataFrame first
9 as_frame=True,
10 scaler_type='minmax', # Match paper example script
11 use_processed_cache=True, # Use cache if available
12 save_processed_frame=True, # Save if reprocessed
13 cache_suffix="_paper_proc", # Suffix for this specific processing
14 verbose=True
15 )
16 print("\nZhongshan data loaded and processed.")
17 df_processed.info()
18
19except FileNotFoundError:
20 print("\nERROR: Raw data file 'zhongshan_2000.csv' not found and"
21 " could not be downloaded. Cannot proceed.")
22 # Handle error appropriately in a real script
23 raise
24except Exception as e:
25 print(f"\nERROR during data loading/processing: {e}")
26 raise
Step 3: Define Feature Sets for Sequencing¶
Based on the columns present after preprocessing (including the new
one-hot encoded columns), we define the lists required by
reshape_xtft_data().
1target_col = 'subsidence'
2dt_col = 'year' # Time index column
3spatial_cols = ['longitude', 'latitude'] # Grouping and also static
4
5# Identify encoded columns automatically (example)
6encoded_cols = [c for c in df_processed.columns if
7 c.startswith('geology_') or c.startswith('density_tier_')]
8
9# Define final feature sets
10final_static_cols = list(spatial_cols) + encoded_cols
11# Numerical columns from paper (excluding target, year) after scaling
12final_dynamic_cols = [
13 'GWL', 'rainfall_mm', 'normalized_density',
14 'normalized_seismic_risk_score'
15 ]
16# Future columns from paper
17final_future_cols = ['rainfall_mm']
18
19# Verify all defined columns exist in the processed dataframe
20all_needed_cols = (
21 [dt_col, target_col] + spatial_cols + final_static_cols +
22 final_dynamic_cols + final_future_cols
23)
24missing_cols = [c for c in set(all_needed_cols) if c not in df_processed.columns]
25if missing_cols:
26 raise ValueError(f"Columns required for sequencing missing from"
27 f" processed data: {missing_cols}")
28
29print("\nFeature sets defined for sequence reshaping.")
30print(f" Static : {final_static_cols}")
31print(f" Dynamic: {final_dynamic_cols}")
32print(f" Future : {final_future_cols}")
Step 4: Generate Sequences¶
Now, use reshape_xtft_data() with the
processed DataFrame and defined column lists to create the sequence
arrays. We use time_steps=4 and forecast_horizon=4 based on the
paper’s example script configuration.
1time_steps = 4 # Lookback window from script
2forecast_horizon = 4 # Prediction horizon from script (2023-2026)
3
4print(f"\nReshaping data (T={time_steps}, H={forecast_horizon})...")
5static_data, dynamic_data, future_data, target_data = reshape_xtft_data(
6 df=df_processed,
7 dt_col=dt_col,
8 target_col=target_col,
9 static_cols=final_static_cols,
10 dynamic_cols=final_dynamic_cols,
11 future_cols=final_future_cols,
12 spatial_cols=spatial_cols,
13 time_steps=time_steps,
14 forecast_horizons=forecast_horizon,
15 verbose=1
16)
Step 5: Data Splitting for Training¶
Split the generated sequences into training and validation sets. The paper used 2015-2022 for training and 2023 for testing. Here, we split the generated sequences (which implicitly capture this time split if reshape_xtft_data processed chronologically per group) using a standard ratio (e.g., 80/20).
1# Split sequences (example: 80% train, 20% validation)
2X_static_train, X_static_val, \
3X_dynamic_train, X_dynamic_val, \
4X_future_train, X_future_val, \
5y_train, y_val = train_test_split(
6 static_data, dynamic_data, future_data, target_data,
7 test_size=0.2, # 20% for validation
8 random_state=42 # For reproducible split
9)
10
11# Package inputs into lists
12train_inputs = [X_static_train, X_dynamic_train, X_train_future]
13val_inputs = [X_static_val, X_dynamic_val, X_val_future]
14
15print("\nSequence data split into Train/Validation sets:")
16print(f" Train sequences: {len(y_train)}")
17print(f" Validation sequences: {len(y_val)}")
Step 6: Define and Train XTFT Model¶
Instantiate the XTFT model (note: the script used
SuperXTFT, but we use XTFT as requested). Configure it for quantile
forecasting using hyperparameters potentially derived from tuning or
the paper’s example. Compile with quantile loss and train using .fit().
1quantiles = [0.1, 0.5, 0.9]
2output_dim = 1 # Predicting 'subsidence'
3
4# Example Hyperparameters (adjust based on tuning/paper)
5best_params = {
6 'embed_dim': 32, 'max_window_size': time_steps, 'memory_size': 100,
7 'num_heads': 4, 'dropout_rate': 0.1, 'lstm_units': 64,
8 'attention_units': 64, 'hidden_units': 32, 'multi_scale_agg': 'auto',
9}
10
11# Instantiate XTFT model
12xtft_model = XTFT(
13 static_input_dim=static_data.shape[-1],
14 dynamic_input_dim=dynamic_data.shape[-1],
15 future_input_dim=future_data.shape[-1],
16 forecast_horizon=forecast_horizon,
17 quantiles=quantiles,
18 output_dim=output_dim,
19 **best_params
20)
21print("\nXTFT model instantiated.")
22
23# Compile with quantile loss
24loss_fn = combined_quantile_loss(quantiles=quantiles)
25xtft_model.compile(
26 optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
27 loss=loss_fn
28)
29print("Model compiled.")
30
31# Define Callbacks
32early_stopping = EarlyStopping(monitor='val_loss', patience=5,
33 restore_best_weights=True)
34model_checkpoint_path = os.path.join(
35 output_dir_zhongshan, 'xtft_zhongshan_best.keras' # Use .keras format
36 )
37model_checkpoint = ModelCheckpoint(
38 model_checkpoint_path, monitor='val_loss', save_best_only=True,
39 save_weights_only=False, verbose=1
40)
41
42# Train the model
43print("Starting model training...")
44history = xtft_model.fit(
45 train_inputs,
46 y_train,
47 validation_data=(val_inputs, y_val),
48 epochs=5, # Increase significantly for real training (e.g., 50)
49 batch_size=32,
50 callbacks=[early_stopping, model_checkpoint],
51 verbose=1
52)
53print("Training finished.")
54
55# Optional: Load the best model saved by checkpoint
56# try:
57# print("Loading best model from checkpoint...")
58# best_xtft_model = tf.keras.models.load_model(
59# model_checkpoint_path,
60# custom_objects={"combined_quantile_loss": loss_fn} # Pass loss
61# )
62# except Exception as e:
63# print(f"Could not load checkpoint, using last model state. Error: {e}")
64# best_xtft_model = xtft_model # Fallback to last state
Step 7: Generate Forecasts¶
Use the trained model and the generate_forecast()
utility to generate predictions for future years (e.g., 2023-2026, matching
the forecast_horizon). This function handles preparing the necessary
input sequences from the end of the training data. We also provide the
hold-out test data (actual 2023 data) for evaluation within the function.
1# Need the processed DataFrame BEFORE sequencing for generate_forecast
2# Also need the original unprocessed test data for evaluation comparison
3# Re-load/split original data if not kept from Step 2 of the script
4# For simplicity, assume df_processed contains data up to 2022
5# and we have a separate test_df_raw for 2023 actuals.
6
7# Placeholder for actual 2023 test data (load this properly)
8# test_data = df_raw[df_raw['year'] == 2023].copy()
9# Ensure test_data has same columns as needed for eval inside generate_forecast
10# For demonstration, we'll use the validation set derived from sequences
11# NOTE: generate_forecast expects the *training* data to find the last sequence
12# and test data for evaluation. Need df_processed up to end of training.
13df_processed_train = df_processed[df_processed[dt_col] <= 2022] # Example filter
14
15print("\nGenerating forecast using generate_forecast...")
16# Define the forecast years explicitly
17forecast_years = [2023, 2024, 2025, 2026] # Matches horizon=4
18
19# Use generate_forecast
20forecast_df = generate_forecast(
21 xtft_model=xtft_model, # Use the trained model
22 train_data=df_processed_train, # Data model was trained on (for last seq)
23 dt_col=dt_col,
24 time_steps=time_steps,
25 forecast_horizon=forecast_horizon,
26 static_features=final_static_cols,
27 dynamic_features=final_dynamic_cols,
28 future_features=final_future_cols,
29 spatial_cols=spatial_cols,
30 # test_data=test_data, # Provide actual 2023 data for evaluation
31 mode="quantile",
32 q=quantiles, # Pass the quantiles used
33 tname=target_col,
34 forecast_dt=forecast_years, # Specify exact output years
35 scaler=scaler_n, # Pass the scaler used for numerical cols
36 num_cols=cols_to_scale_n, # List of cols that were scaled
37 savefile=os.path.join(output_dir_zhongshan, "zhongshan_forecast.csv"),
38 verbose=1
39)
40
41print("\nForecast DataFrame head:")
42print(forecast_df.head())
Step 8: Visualize Forecasts¶
Use the visualize_forecasts() utility to plot
the spatial distribution of actual (if available) and predicted
subsidence for specific forecast periods (years).
1# Need actual test data (e.g., for 2023) for comparison plots
2# Assuming test_data DataFrame for 2023 exists from script's Step 6
3# test_data = df_raw[df_raw['year'] == 2023] # Example
4
5print("\nGenerating forecast visualization...")
6try:
7 # Visualize forecast for 2023 (requires actual 2023 data in test_data)
8 # Create dummy test_data if needed for visualization code to run
9 if 'test_data' not in locals():
10 warnings.warn("Actual test_data for 2023 not available,"
11 " visualization will only show predictions.")
12 test_data_vis = None
13 else:
14 test_data_vis = test_data
15
16 visualize_forecasts(
17 forecast_df=forecast_df,
18 test_data=test_data_vis, # Provide actuals if available
19 dt_col=dt_col,
20 tname=target_col,
21 eval_periods=[2023], # Evaluate/plot only 2023
22 mode="quantile", # Match forecast mode
23 kind="spatial",
24 x="longitude", # Coordinate columns
25 y="latitude",
26 verbose=1
27 )
28except Exception as e:
29 print(f"Visualization failed. Error: {e}")