fusionlab.datasets.fetch_nansha_data¶
- fusionlab.datasets.fetch_nansha_data(*, n_samples=None, as_frame=False, include_coords=True, include_target=True, data_home=None, download_if_missing=True, force_download=False, random_state=None, verbose=True)[source]¶
Fetch the sampled Nansha land subsidence dataset (2000 points).
Loads the nansha_2000.csv file, which contains features related to land subsidence in Nansha, China, spatially sampled down to 2000 representative data points. It includes geographical coordinates, temporal information (year), geological factors, hydrogeological factors (GWL, rainfall), building concentration, risk scores, soil thickness, and the measured land subsidence (target).
Optionally allows further sub-sampling using the n_samples parameter via
spatial_sampling().Column details: ‘longitude’, ‘latitude’, ‘year’, ‘building_concentration’, ‘geology’, ‘GWL’, ‘rainfall_mm’, ‘normalized_seismic_risk_score’, ‘soil_thickness’, ‘subsidence’.
The function searches for the data file (nansha_2000.csv) using the logic in
download_file_if()(Cache > Package > Download).- Parameters:
n_samples (
int,strorNone, defaultNone) –Number of samples to load. - If
Noneor'*': Load the full sampled dataset (~2000 rows). - If int: Sub-sample the specified number using spatialstratification via
spatial_sampling(). Must be <= number of rows in the full file. Requires spatial_sampling to be available.as_frame (
bool, defaultFalse) – Return type:Falsefor Bunch object,Truefor DataFrame.include_coords (
bool, defaultTrue) – Include ‘longitude’ and ‘latitude’ columns.include_target (
bool, defaultTrue) – Include the ‘subsidence’ column.data_home (
str, optional) – Path to cache directory. Defaults to~/fusionlab_data.download_if_missing (
bool, defaultTrue) – Attempt download if file is not found locally.force_download (
bool, defaultFalse) – Force download attempt even if file exists locally.random_state (
int, optional) – Seed for the random number generator used during sub-sampling.verbose (
bool, defaultTrue) – Print status messages during file fetching and sampling.
- Returns:
data – Loaded or sampled data. Bunch object includes frame, data, feature_names, target_names, target, coords, and DESCR.
- Return type:
Bunchorpandas.DataFrame- Raises:
ValueError – If n_samples is invalid.
FileNotFoundError – If the dataset file cannot be found or downloaded.
OSError – If there is an error reading the dataset file.