fusionlab.datasets.fetch_zhongshan_data¶
- fusionlab.datasets.fetch_zhongshan_data(*, n_samples=None, as_frame=False, include_coords=True, include_target=True, data_home=None, download_if_missing=True, force_download=False, random_state=None, verbose=True)[source]¶
Fetch the Zhongshan land subsidence dataset (sampled 2000 points).
Loads the zhongshan_2000.csv file, which contains features related to land subsidence spatially sampled down to ~2000 points from a larger dataset [Liu24]. Includes coordinates, year, hydrogeological factors, geological properties, risk scores, and measured subsidence (target).
Optionally allows further sub-sampling using the n_samples parameter via
spatial_sampling().Column details: ‘longitude’, ‘latitude’, ‘year’, ‘GWL’, ‘seismic_risk_score’, ‘rainfall_mm’, ‘subsidence’, ‘geological_category’, ‘normalized_density’, ‘density_tier’, ‘subsidence_intensity’, ‘density_concentration’, ‘normalized_seismic_risk_score’, ‘rainfall_category’.
- Parameters:
n_samples (
int,strorNone, defaultNone) –Number of samples to load. - If
Noneor'*': Load the full sampled dataset (~2000 rows). - If int: Sub-sample the specified number using spatialstratification via
spatial_sampling(). Must be <= number of rows in the full file. Requires spatial_sampling to be available.as_frame (
bool, defaultFalse) – Return type:Falsefor Bunch object,Truefor DataFrame.include_coords (
bool, defaultTrue) – Include ‘longitude’ and ‘latitude’ columns.include_target (
bool, defaultTrue) – Include the ‘subsidence’ column.data_home (
str, optional) – Path to cache directory. Defaults to~/fusionlab_data.download_if_missing (
bool, defaultTrue) – Attempt download if file is not found locally.force_download (
bool, defaultFalse) – Force download attempt even if file exists locally.random_state (
int, optional) – Seed for the random number generator used during sub-sampling if n_samples is an integer. Ensures reproducibility.verbose (
bool, defaultTrue) – Print status messages during file fetching and sampling.
- Returns:
data – Loaded or sampled data. Bunch object includes frame, data, feature_names, target_names, target, coords, and DESCR.
- Return type:
Bunchorpandas.DataFrame- Raises:
ValueError – If n_samples is invalid (e.g., non-integer, negative, or larger than available rows when sampling).
FileNotFoundError – If the dataset file cannot be found or downloaded.
OSError – If there is an error reading the dataset file.
References
[Liu24]Liu, J., et al. (2024). Machine learning-based techniques… Journal of Environmental Management, 352, 120078.