Exercise: Hyperparameter Tuning with HydroTuner

Welcome to this hands-on exercise on using the HydroTuner. This guide will walk you through the complete, end-to-end process of finding an optimal set of hyperparameters for a sophisticated physics-informed model, PIHALNet.

This tutorial will demonstrate the full power and flexibility of the tuner, including how to define a custom search space and how to launch the tuning process using both the high-level convenience methods and the lower-level core API.

Learning Objectives:

  • Generate a synthetic hybrid dataset with features and coordinates suitable for a PINN model.

  • Define a custom hyperparameter search_space, including architectural, physical, and optimization parameters.

  • Use the recommended HydroTuner.create() factory method to automatically infer data dimensions and configure the tuner.

  • Launch the tuning process with the high-level .run() method, which accepts NumPy arrays directly.

  • (Advanced) Understand how to manually prepare a tf.data.Dataset and use the lower-level .search() method.

  • Retrieve the best hyperparameters and the best-performing model after the search is complete.

Let’s begin!

Prerequisites

Ensure you have fusionlab-learn and its dependencies, including keras-tuner, installed.

pip install fusionlab-learn "keras-tuner>=1.4.0" matplotlib scikit-learn

Step 1: Imports and Setup

First, we import all necessary libraries and set up our environment.

 1import os
 2import numpy as np
 3import tensorflow as tf
 4import matplotlib.pyplot as plt
 5
 6# FusionLab imports
 7from fusionlab.nn.forecast_tuner import HydroTuner
 8from fusionlab.nn.pinn.models import PIHALNet # The model we'll tune
 9
10# Suppress warnings and TF logs for cleaner output
11import warnings
12warnings.filterwarnings('ignore')
13tf.get_logger().setLevel('ERROR')
14
15# Directory for saving tuner results and plots
16EXERCISE_OUTPUT_DIR = "./hydrotuner_exercise_outputs"
17os.makedirs(EXERCISE_OUTPUT_DIR, exist_ok=True)
18
19print("Libraries imported and setup complete.")

Expected Output:

Libraries imported and setup complete.

Step 2: Generate Synthetic Hybrid Data

We will create a synthetic dataset suitable for PIHALNet, which requires both feature-based inputs (static, dynamic, future) and a coords tensor for the physics module. We will keep these as raw NumPy arrays, as this is the expected format for the tuner’s .run() method.

 1# Configuration
 2N_SAMPLES = 800
 3PAST_STEPS = 12
 4HORIZON = 5
 5STATIC_DIM, DYNAMIC_DIM, FUTURE_DIM = 4, 6, 3
 6SEED = 42
 7np.random.seed(SEED)
 8tf.random.set_seed(SEED)
 9
10# --- Generate Data Arrays ---
11inputs = {
12    "coords": np.random.rand(N_SAMPLES, HORIZON, 3).astype(np.float32),
13    "static_features": np.random.rand(N_SAMPLES, STATIC_DIM).astype(np.float32),
14    "dynamic_features": np.random.rand(N_SAMPLES, PAST_STEPS, DYNAMIC_DIM).astype(np.float32),
15    "future_features": np.random.rand(N_SAMPLES, HORIZON, FUTURE_DIM).astype(np.float32),
16}
17targets = {
18    "subsidence": np.random.rand(N_SAMPLES, HORIZON, 1).astype(np.float32),
19    "gwl": np.random.rand(N_SAMPLES, HORIZON, 1).astype(np.float32)
20}
21
22# Create a validation split
23val_split = -100
24train_inputs = {k: v[:val_split] for k, v in inputs.items()}
25val_inputs = {k: v[val_split:] for k, v in inputs.items()}
26train_targets = {k: v[:val_split] for k, v in targets.items()}
27val_targets = {k: v[val_split:] for k, v in targets.items()}
28
29print(f"Generated {len(train_inputs['static_features'])} training and "
30      f"{len(val_inputs['static_features'])} validation samples.")

Expected Output:

Generated 700 training and 100 validation samples.

Step 3: Define the Tuning Experiment

This is where we tell the tuner what to optimize. We define a search_space dictionary containing all the hyperparameters we want to explore.

 1search_space = {
 2    # --- Architectural Hyperparameters ---
 3    "embed_dim": [16, 32, 64], # Discrete choice
 4    "num_heads": [2, 4],
 5    "dropout_rate": {"type": "float", "min_value": 0.05, "max_value": 0.3},
 6
 7    # --- Physics-Informed Hyperparameters ---
 8    # Tune whether the coefficient is fixed or learned
 9    "pinn_coefficient_C": ["learnable", 1e-3, 5e-3],
10    # The lambda weight for the physics loss
11    "lambda_physics": {"type": "float", "min_value": 0.05, "max_value": 0.5},
12
13    # --- Optimization Hyperparameters ---
14    "learning_rate": {"type": "choice", "values": [1e-3, 5e-4, 1e-4]}
15}
16print("Hyperparameter search space defined.")

Step 4: Launch the Search with the High-Level .run() Method

This is the recommended workflow. We use the HydroTuner.create() factory method, which automatically inspects our data to determine fixed parameters like input/output dimensions. We then call .run(), which handles the conversion of our NumPy arrays into tf.data.Dataset objects internally.

 1# 1. Create the tuner using the factory method
 2tuner = HydroTuner.create(
 3    model_name_or_cls=PIHALNet,
 4    inputs_data=train_inputs,
 5    targets_data=train_targets,
 6    search_space=search_space,
 7    # Keras Tuner configuration
 8    objective="val_loss",
 9    max_trials=5, # Keep low for this example
10    project_name="PIHALNet_Tuning_Exercise_Run",
11    directory=EXERCISE_OUTPUT_DIR,
12    overwrite=True
13)
14
15# 2. Start the search by calling .run()
16print("\nStarting hyperparameter search with the .run() method...")
17best_model, best_hps, tuner_instance = tuner.run(
18    inputs=train_inputs,
19    y=train_targets,
20    validation_data=(val_inputs, val_targets),
21    epochs=5, # Train each trial for 5 epochs
22    batch_size=64,
23    callbacks=[tf.keras.callbacks.EarlyStopping('val_loss', patience=3)]
24)
25
26print("\n--- Search via .run() Complete ---")
27if best_hps:
28    print("Best learning rate found:", best_hps.get('learning_rate'))

Step 5 (Advanced): Using the Low-Level .search() Method

This section demonstrates the alternative workflow for users who prefer to manage their data pipelines manually. Here, we first create tf.data.Dataset objects ourselves and then instantiate the tuner using its direct __init__, which requires us to provide the complete set of fixed_params manually.

 1# 1. Manually prepare tf.data.Dataset objects
 2# Note: The tuner's internal logic renames target keys automatically
 3train_dataset = tf.data.Dataset.from_tensor_slices(
 4    (train_inputs, train_targets)).batch(64)
 5val_dataset = tf.data.Dataset.from_tensor_slices(
 6    (val_inputs, val_targets)).batch(64)
 7
 8# 2. Manually define ALL fixed parameters (what .create() does for us)
 9manual_fixed_params = {
10    "static_input_dim": STATIC_DIM,
11    "dynamic_input_dim": DYNAMIC_DIM,
12    "future_input_dim": FUTURE_DIM,
13    "output_subsidence_dim": 1,
14    "output_gwl_dim": 1,
15    "forecast_horizon": HORIZON,
16    "mode": 'pihal_like' # An example of another fixed param
17}
18
19# 3. Instantiate the tuner directly
20tuner_adv = HydroTuner(
21    model_name_or_cls=PIHALNet,
22    fixed_params=manual_fixed_params,
23    search_space=search_space,
24    objective="val_loss",
25    max_trials=5,
26    project_name="PIHALNet_Tuning_Exercise_Search",
27    directory=EXERCISE_OUTPUT_DIR,
28    overwrite=True
29)
30
31# 4. Start the search by calling the base .search() method
32print("\nStarting hyperparameter search with the .search() method...")
33# Note: .search() is called by the inherited .fit() from PINNTunerBase
34_, _, tuner_instance_adv = tuner_adv.run(
35    inputs=train_inputs, # Still needed for case info
36    y=train_targets,
37    validation_data=(val_inputs, val_targets), # passed to search()
38    epochs=5,
39    batch_size=64 # used by the wrapper
40)
41print("\n--- Search via .search() Complete ---")

Step 6: Analyze Results and Retrieve the Best Model

After the search completes (either way), the tuner object contains the results. You can retrieve the best hyperparameters and the best model, which has been automatically retrained on the full dataset.

 1print("\n--- Summary of Best Hyperparameters ---")
 2# Get the best hyperparameters from the first tuner run
 3best_hps_found = tuner.get_best_hyperparameters(num_trials=1)[0]
 4for hp, value in best_hps_found.values.items():
 5    print(f"- {hp}: {value}")
 6
 7# Get the best model instance
 8best_pihalnet_model = tuner.get_best_models(num_models=1)[0]
 9
10# You can now use this model for prediction
11print(f"\nBest model summary:")
12best_pihalnet_model.summary(line_length=100)

Discussion of Exercise

Congratulations! You have successfully performed a full hyperparameter tuning workflow for an advanced physics-informed model. In this exercise, you have learned to:

  • Define a flexible search_space to control which parameters are tuned.

  • Use the high-level HydroTuner.create() and .run() methods for a convenient, automated workflow with NumPy data.

  • (Advanced) Understand the lower-level process of manually creating datasets and using the core .search() method.

  • Retrieve the final, optimized model and its hyperparameters from the tuner instance.

This process is fundamental to achieving peak performance with complex deep learning architectures and allows you to systematically find the best configuration for your specific problem.