Preprocess tab (Stage-1)

The Preprocess tab is the dedicated workspace for Stage-1 of the GeoPrior pipeline. Stage-1 prepares everything Stage-2 needs: normalized tensors, scalers, and a manifest that records what was built and how.

This tab is UI-only: it builds widgets, binds options to the configuration store, and displays Stage-1 artifacts. The actual Stage-1 execution is handled by the main controller.

Preprocess tab (Stage-1) with inputs, options, status and workspace.

Preprocess tab layout: paths row, three top cards (Inputs/Options/Status), the Stage-1 workspace, and the Run button.

What you do here

You typically use this tab to:

  • verify the active city + dataset that will be preprocessed,

  • decide whether to reuse an existing compatible Stage-1 run,

  • run Stage-1 and then inspect the resulting manifest, scaling audit, and diagnostics in the workspace.

The top paths row

At the very top, the tab shows two read-only path fields:

  • City root: the folder where Stage-1 artifacts for the active city/model live.

  • Results root: the global results directory the GUI writes under.

The row also includes icon-only actions:

  • Open city folder (disabled until the folder exists),

  • Browse results root…,

  • Refresh Stage-1 status. :contentReference[oaicite:1]{index=1}

City root is computed as:

<results_root>/<city>_<model>_stage1

so each city/model gets a predictable Stage-1 home.

Inputs (City + Dataset)

The Inputs (City + Dataset) card is a quick confirmation of what Stage-1 will operate on. It displays:

  • City: <name>

  • Dataset: <path>

and provides two actions:

  • Open dataset… (select or change the dataset),

  • Feature config… (review feature roles before running).

Stage-1 options

The Stage-1 options card controls reuse/build behavior. Each checkbox is stored in the central configuration store and is therefore saved alongside the run outputs. :contentReference[oaicite:4]{index=4}

The options are:

  • Clean Stage-1 run dir before build → clears the Stage-1 run directory before rebuilding (store key: clean_stage1_dir).

  • Auto-reuse compatible Stage-1 run → if a prior Stage-1 run is compatible with the current configuration, reuse it instead of rebuilding (store key: stage1_auto_reuse_if_match).

  • Force rebuild if mismatch → if an existing Stage-1 run is found but its configuration does not match the current setup, rebuild automatically (store key: stage1_force_rebuild_if_mismatch).

  • Build future NPZ → additionally produce the “future-known” NPZ payloads used by downstream forecasting workflows (store key: build_future_npz).

Stage-1 status (what “OK/MATCH” means)

The Stage-1 status card summarizes what the GUI currently detects for the active city:

  • a state line (for example: OK / MATCH or INCOMPLETE / MISMATCH),

  • a manifest path line (Manifest: <path>),

  • action buttons to open artifacts.

When you press Refresh (or when the tab updates), the tab discovers Stage-1 runs and selects the best candidate. It reports:

  • OK vs INCOMPLETE depending on whether the run looks complete,

  • MATCH vs MISMATCH depending on whether the run’s configuration matches the current Stage-1 configuration snapshot,

  • plus n_train and n_val counts.

If a usable run is found, the tab also loads:

  • the manifest JSON (from the detected manifest path),

  • the scaling audit JSON (stage1_scaling_audit.json) when present,

and pushes them into the workspace panels.

The action buttons are enabled only when their targets exist:

  • Open manifest: opens the manifest file,

  • Open folder: opens the Stage-1 run directory,

  • Use as default for city: marks this Stage-1 run as the preferred one for the active city in the GUI.

Stage-1 workspace (inspection and diagnostics)

The large Stage-1 workspace area hosts the dedicated Stage-1 inspector (Stage1Workspace). :contentReference[oaicite:18]{index=18}

This workspace is where you review what Stage-1 produced and whether it is ready for Stage-2. It is populated using a shared context containing the city, dataset path, results root, active Stage-1 directory, and model name.

In v3.2, the workspace exposes multiple subpanels (tabs), including:

  • Quicklook: compact context + run summary preview,

  • Readiness: compatibility and “can we reuse?” checks,

  • Feature scaling: scaling audit and feature normalization summaries,

  • Visual checks: quick diagnostic plots for sanity checks,

  • Run history: recently detected Stage-1 runs for the city,

  • Artifacts: direct links to files produced by Stage-1.

Run Stage-1 preprocessing

At the bottom-right, the tab provides the primary action:

  • Run Stage-1 preprocessing (the “play” run button).

Pressing it triggers the Stage-1 job using the current store-backed configuration (including the Stage-1 options above), and progress/log output is streamed to the GUI log panel.

Note

If you switch cities or change setup parameters that affect Stage-1, always press Refresh Stage-1 status to confirm whether the current Stage-1 run is still a MATCH, or whether a rebuild is required.