Hyperparameter Tuning with HydroTuner¶

API Reference:: HydroTuner

Finding the optimal set of hyperparameters for complex models like PIHALNet and TransFlowSubsNet can be a challenging and time-consuming task. The HydroTuner class is a powerful utility designed to automate this process, enabling you to efficiently search for the best model architecture and training configuration for your specific dataset.

Built on top of the industry-standard Keras Tuner library, HydroTuner provides a streamlined interface tailored for the unique requirements of hybrid physics-informed models.

Core Concepts¶

The design of HydroTuner is centered on flexibility and a clear separation of concerns between the problem definition and the tuning experiment.

Model-Agnostic Design¶

The tuner is not hardcoded to a single model. By simply passing a model class (e.g., PIHALNet or TransFlowSubsNet) to its constructor, it dynamically adapts its internal build process to construct and tune that specific model. This makes the tuner extensible to new models in the future.

Separation of Concerns: `fixed_params` vs. `search_space`¶

Understanding the difference between these two key parameters is essential for using the tuner effectively:

`fixed_params` (The Problem Definition): This dictionary holds
all parameters that are not tuned and remain constant for an entire tuning job. It primarily contains data-dependent dimensions that define the model’s static structure, such as: * static_input_dim, dynamic_input_dim, future_input_dim * output_subsidence_dim, output_gwl_dim * forecast_horizon These values define the specific problem you are trying to solve.
`search_space` (The Experiment Definition): This dictionary
defines the universe of all hyperparameters you want to optimize. This includes everything from network architecture (embed_dim, num_heads, dropout_rate) to physics parameters (K, Ss, C) and training parameters (learning_rate, lambda_gw).

This separation allows you to define a single problem (via fixed_params) and run many different tuning experiments on it by simply changing the search_space.

Defining the Search Space¶

The search_space dictionary is where you define what to tune. The tuner supports several flexible formats for defining the range of each hyperparameter.

List for Discrete Choices:

Provides a list of explicit values to test.
```
{'num_heads': [2, 4, 8], 'activation': ['relu', 'gelu']}
```

Dictionary for Ranges:

For integers or floats, you can specify a range. The dictionary must include a type key.

# Integer range with a step
{'lstm_units': {'type': 'int', 'min_value': 32, 'max_value': 128, 'step': 32}}

# Float range with logarithmic sampling
{'learning_rate': {'type': 'float', 'min_value': 1e-4, 'max_value': 1e-2, 'sampling': 'log'}}

Dictionary for Booleans:

To tune whether a feature should be enabled or disabled. .. code-block:: python

{‘use_residuals’: {‘type’: ‘bool’}}

This flexible format gives you complete control over the scope of the hyperparameter search.

End-to-End Workflow¶

The recommended workflow involves using the HydroTuner.create() factory method, which simplifies the setup process significantly.

Step 1: Prepare Data¶

First, load your input features and target variables as dictionaries of NumPy arrays.

import numpy as np

# Example dummy data
B, T, H = 128, 15, 7
S_DIM, D_DIM, F_DIM = 3, 5, 2

inputs = {
    "coords": np.random.rand(B, H, 3),
    "static_features": np.random.rand(B, S_DIM),
    "dynamic_features": np.random.rand(B, T, D_DIM),
    "future_features": np.random.rand(B, H, F_DIM),
}
targets = {
    "subsidence": np.random.rand(B, H, 1),
    "gwl": np.random.rand(B, H, 1)
}

Step 2: Define the Search Space¶

Create the dictionary that defines your tuning experiment. Here, we’ll define a space for tuning a TransFlowSubsNet model, including its physics parameters.

search_space = {
    # Architectural HPs
    "embed_dim": [32, 64],
    "dropout_rate": {"type": "float", "min_value": 0.0, "max_value": 0.4},

    # Physics HPs for TransFlowSubsNet
    "K": ["learnable", 1e-5, 1e-4], # Tune between learnable or fixed values
    "Ss": {"type": "float", "min_value": 1e-6, "max_value": 1e-4, "sampling": "log"},

    # Compile-time HPs
    "learning_rate": [1e-3, 5e-4, 1e-4],
    "lambda_gw": {"type": "float", "min_value": 0.5, "max_value": 1.5},
    "lambda_cons": {"type": "float", "min_value": 0.1, "max_value": 1.0}
}

Step 3: Create the Tuner with `.create()`¶

Use the factory method to instantiate the tuner. It will automatically infer the data dimensions from your NumPy arrays. You can still provide manual fixed_params to override any defaults or inferred values.

from fusionlab.nn.forecast_tuner import HydroTuner
from fusionlab.nn.pinn import TransFlowSubsNet

tuner = HydroTuner.create(
    model_name_or_cls=TransFlowSubsNet,
    inputs_data=inputs,
    targets_data=targets,
    search_space=search_space,
    fixed_params={"quantiles": None}, # Manually specify no quantiles

    # Keras Tuner settings
    objective="val_loss",
    max_trials=25,
    project_name="TransFlowSubsNet_Optimization"
)

print(f"Tuner created for model: {tuner.model_class.__name__}")
print(f"Inferred forecast horizon: {tuner.fixed_params['forecast_horizon']}")

Step 4: Run the Search¶

Call the .run() method to start the hyperparameter search. You will typically want to include an EarlyStopping callback.

from tensorflow.keras.callbacks import EarlyStopping

# Note: The 'run' method is an alias for the base 'fit' method.
best_model, best_hps, tuner_instance = tuner.run(
    inputs=inputs,
    y=targets,
    validation_data=(inputs, targets), # Use same data for example
    epochs=50,
    batch_size=32,
    callbacks=[EarlyStopping(patience=5, monitor='val_loss')]
)

Step 5: Retrieve and Use the Best Model¶

After the search is complete, the tuner object holds the best hyperparameters and a retrained model ready for use.

print("\n--- Best Hyperparameters Found ---")
for hp, value in best_hps.values.items():
    print(f"{hp}: {value}")

# The best_model is already retrained and ready for prediction or saving.
# best_model.save("my_best_hydro_model.keras")

API Reference¶

class fusionlab.nn.forecast_tuner.HydroTuner[source]¶

Bases: PINNTunerBase

A robust and flexible hyperparameter tuner for hydrogeological PINN models.

This class provides a unified interface to perform hyperparameter optimization for complex physics-informed models like PIHALNet and TransFlowSubsNet, using keras-tuner as the backend.

It is designed to be highly flexible, separating the fixed, data-dependent parameters from the tunable hyperparameters, which are defined in a user-provided search space. The tuner dynamically constructs and compiles the specified model for each trial based on these configurations.

The recommended way to instantiate this class is through the .create() factory method, which can automatically infer data dimensions.

Parameters:

model_name_or_cls (str or Type[Model]) – The specific model to be tuned. This can be provided as a string identifier (e.g., 'PIHALNet', 'TransFlowSubsNet') or as the model class object itself.
fixed_params (dict) – A dictionary containing all parameters that are not to be tuned. This must include all data-dependent dimensions required by the model’s constructor, such as: - static_input_dim - dynamic_input_dim - future_input_dim - output_subsidence_dim - output_gwl_dim - forecast_horizon It can also include other fixed settings like quantiles or mode.
search_space (dict, optional) –
A dictionary defining the hyperparameter search space. The keys are the names of the parameters to tune, and the values define their search range or choices.
- For discrete choices, provide a list:{'num_heads': [2, 4, 8]}
- For ranges, provide a dictionary with a type key: {'dropout_rate': {'type': 'float', 'min_value': 0.1, 'max_value': 0.4}} Supported types are 'int', 'float', 'bool', and 'choice'.
objective (str or keras_tuner.Objective, default 'val_loss') – The metric to optimize. The direction (min/max) is inferred automatically if the name contains “loss”.
max_trials (int, default 10) – The total number of hyperparameter combinations to test.
project_name (str, default "HydroTuner_Project") – The name for the tuning project. Results for each trial will be stored in a subdirectory within the directory.
directory (str, default "hydrotuner_results") – The root directory where tuning results are saved.
executions_per_trial (int, default 1) – The number of times to train a model with the same set of hyperparameters. The final score is averaged.
tuner_type ({'randomsearch', 'bayesianoptimization', 'hyperband'}, default 'randomsearch') – The search algorithm to use.
seed (int, optional) – A random seed for reproducibility of the search process.
overwrite (bool, default True) – If True, any existing results in the project_name directory will be overwritten. Set to False to resume a previous search.
param_space (dict, optional) – [DEPRECATED] Use search_space instead. This parameter is maintained for backward compatibility and will issue a FutureWarning if used.
**kwargs – Additional keyword arguments passed to the parent PINNTunerBase constructor.

Notes

The HydroTuner separates the model’s configuration into two distinct parts passed at initialization:

`fixed_params`: Defines the static properties of a given modeling problem, primarily the shapes of the data. These do not change during the tuning process.
`search_space`: Defines the experiment. This dictionary contains all the architectural, physical, and optimization parameters that you want to find the optimal values for.

This design allows the build method to be completely generic. It constructs the model_params by combining the fixed_params with values sampled from the search_space for each trial, making the tuner easily adaptable to new models in the future.

See also

HydroTuner.create: The recommended factory method for creating a tuner instance by inferring dimensions from data.
HydroTuner.build: The method that constructs a model for a single trial.
HydroTuner.run: The main method to start the hyperparameter search.
fusionlab.nn.pinn.models.PIHALNet: One of the target models for this tuner.
fusionlab.nn.pinn.models.TransFlowSubsNet: The other primary target model.

Examples

>>> from fusionlab.nn.forecast_tuner import HydroTuner
>>> from fusionlab.nn.pinn.models import TransFlowSubsNet

>>> # 1. Define fixed parameters (e.g., from data shapes)
>>> fixed_params = {
...     "static_input_dim": 5,
...     "dynamic_input_dim": 8,
...     "future_input_dim": 3,
...     "output_subsidence_dim": 1,
...     "output_gwl_dim": 1,
...     "forecast_horizon": 7,
...     "quantiles": [0.1, 0.5, 0.9]
... }

>>> # 2. Define the hyperparameter search space
>>> search_space = {
...     # Architectural HPs
...     "embed_dim": [32, 64],
...     "num_heads": [2, 4],
...     "dropout_rate": {"type": "float", "min_value": 0.1, "max_value": 0.3},
...     # Physics HPs for TransFlowSubsNet
...     "K": ["learnable", 1e-4],
...     # Compile-time HPs
...     "learning_rate": [1e-3, 5e-4],
...     "lambda_gw": {"type": "float", "min_value": 0.5, "max_value": 1.5}
... }

>>> # 3. Instantiate the tuner
>>> tuner = HydroTuner(
...     model_name_or_cls=TransFlowSubsNet,
...     fixed_params=fixed_params,
...     search_space=search_space,
...     objective="val_loss",
...     max_trials=20,
...     project_name="TransFlowSubsNet_Optimization"
... )
>>> print(f"Tuner configured for model: {tuner.model_class.__name__}")
Tuner configured for model: TransFlowSubsNet

__init__(model_name_or_cls, fixed_params, search_space=None, objective='val_loss', max_trials=10, project_name='HydroTuner_Project', directory='hydrotuner_results', executions_per_trial=1, tuner_type='randomsearch', seed=None, overwrite=True, param_space=None, **kwargs)[source]¶

Initialize the base class.

Parameters:

verbose (int, optional) – Verbosity level controlling logging (0 to 3). Defaults to 0.
model_name_or_cls (str | Type[Model])
fixed_params (Dict[str, Any])
search_space (Dict[str, Any] | None)
objective (str | Objective)
max_trials (int)
project_name (str)
directory (str)
executions_per_trial (int)
tuner_type (str)
seed (int | None)
overwrite (bool)
param_space (Dict[str, Any] | None)

classmethod create(model_name_or_cls, inputs_data, targets_data, search_space, fixed_params=None, **tuner_kwargs)[source]¶

Primary factory method to create and configure a HydroTuner.

This classmethod simplifies the tuner setup by automatically inferring essential data-dependent parameters (like input/output dimensions and forecast horizon) directly from the provided NumPy data arrays. It then intelligently merges these inferred parameters with model-specific defaults and any user-provided overrides.

Parameters:

model_name_or_cls (str or Model Class) – The model to be tuned. Can be a string identifier like 'PIHALNet' or 'TransFlowSubsNet', or the model class object itself.
inputs_data (dict of np.ndarray) – A dictionary of input NumPy arrays, keyed by feature type (e.g., {'static_features': ..., 'dynamic_features': ...}).
targets_data (dict of np.ndarray) – A dictionary of target NumPy arrays, keyed by target name (e.g., {'subsidence': ..., 'gwl': ...}). The keys will be automatically standardized.
search_space (dict) – The hyperparameter search space definition. Keys are parameter names, and values are their search definitions (lists or dictionaries).
fixed_params (dict, optional) – A dictionary of parameters to manually set. These will override any inferred values or defaults, giving the user final control.
**tuner_kwargs – Standard arguments for the Keras Tuner backend, such as objective, max_trials, and project_name.

Returns:

An initialized HydroTuner instance, fully configured and ready for the .run() method to be called.

Return type:

HydroTuner

See also

_infer_and_merge_params: The internal static method that handles the parameter inference and merging logic.

Examples

>>> # Example for tuning a TransFlowSubsNet
>>> search_space_tf = {
...     "learning_rate": [1e-3, 5e-4],
...     "dropout_rate": {"type": "float", "min_value": 0.1, "max_value": 0.4},
...     "K": ["learnable", 1e-4]
... }
>>> tuner = HydroTuner.create(
...     model_name_or_cls="TransFlowSubsNet",
...     inputs_data=my_inputs,
...     targets_data=my_targets,
...     search_space=search_space_tf,
...     fixed_params={"forecast_horizon": 7}, # Override inferred horizon
...     max_trials=20,
...     project_name="MyTransFlowTuning"
... )

build(hp)[source]¶

Builds and compiles a model for a single hyperparameter trial.

This method is called internally by the Keras Tuner for each trial in the search process. It dynamically constructs a model instance (e.g., PIHALNet or TransFlowSubsNet) using the combination of fixed parameters and the hyperparameters sampled for the current trial.

The process involves: 1. Iterating through the user-defined search_space. 2. Sampling a value for each hyperparameter using the hp object. 3. Separating model initialization arguments from model compilation

arguments.

Instantiating the model specified by self.model_class.
Compiling the model with the appropriate optimizer, losses, and physics-loss weights (\(\lambda\)).

Parameters:: hp (keras_tuner.HyperParameters) – The HyperParameters object for the current trial, provided by the Keras Tuner framework. It is used to sample values for each hyperparameter defined in the search_space.
Returns:: A fully compiled Keras model instance, ready to be trained for the current trial.
Return type:: tf.keras.Model

Notes

This method should not be called directly by the user. It is part of the Keras Tuner HyperModel API and is invoked by the tuner’s search() loop. Its generic design allows it to build any model whose parameters are defined in the search_space.

run(inputs, y, validation_data=None, epochs=10, batch_size=32, callbacks=None, case_info=None, verbose=1, **search_kwargs)[source]¶

Executes the end-to-end hyperparameter search workflow.

This is the primary method to start the tuning process. It orchestrates data preparation, dimension inference (if needed), and the Keras Tuner search loop.

Parameters:

inputs (dict of np.ndarray) – A dictionary of NumPy arrays for model inputs. Must contain keys like "static_features", "dynamic_features", etc., matching the model’s requirements.
y (dict of np.ndarray) – A dictionary of NumPy arrays for targets. It expects keys like "subsidence" and "gwl", which are automatically renamed to match the model’s output layers.
validation_data (tuple, optional) – A tuple (val_inputs, val_targets) with the same dictionary structure as inputs and y. Used for the validation step during tuning. Default is None.
epochs (int, default 10) – The number of epochs to train each model trial.
batch_size (int, default 32) – The batch size for creating tf.data.Dataset objects from the input NumPy arrays.
callbacks (list of keras.callbacks.Callback, optional) – A list of Keras callbacks to use during the search. A default EarlyStopping callback is added automatically if not provided. Default is None.
case_info (dict, optional) – A dictionary of additional metadata to save in the final tuning summary JSON file.
verbose (int, default 1) – Verbosity level for logging during the tuning process.
**search_kwargs – Additional keyword arguments forwarded directly to the Keras Tuner search() method.

Returns:

A tuple containing: - The best model found, retrained on the full dataset. - The best hyperparameter configuration object. - The Keras Tuner instance containing the search history.

Return type:

tuple of (Model | None, HyperParameters | None, Tuner | None)

Raises:

ValueError – If required keys are missing from the inputs or y dictionaries.

help(**kwargs)¶

my_params = HydroTuner( model_name_or_cls, fixed_params, search_space=None, objective='val_loss', max_trials=10, project_name='HydroTuner_Project', directory='hydrotuner_results', executions_per_trial=1, tuner_type='randomsearch', seed=None, overwrite=True, param_space=None )¶

best_hps_: HyperParameters | None¶

best_model_: Model | None¶

tuner_: Tuner | None¶

tuning_summary_: Dict[str, Any]¶

fixed_model_params: Dict[str, Any]¶

param_space_config: Dict[str, Any]¶