fusionlab.datasets.make_anomaly_data¶
- fusionlab.datasets.make_anomaly_data(n_sequences=200, sequence_length=50, n_features=1, anomaly_fraction=0.1, anomaly_type='spike', anomaly_magnitude=5.0, noise_level=0.2, as_frame=False, seed=None)[source]¶
Generate sequence data with injected anomalies.
Creates a dataset of time series sequences, where a specified fraction contains synthetically generated anomalies (spikes or level shifts). It returns the sequences and corresponding binary labels (0 for normal, 1 for anomaly).
This data is useful for testing and evaluating anomaly detection algorithms like
LSTMAutoencoderAnomalyor anomaly-aware training strategies.- Parameters:
n_sequences (
int, default200) – Total number of sequences to generate.sequence_length (
int, default50) – Number of time steps in each sequence.n_features (
int, default1) – Number of features for each time step. Currently supports 1.anomaly_fraction (
float, default0.1) – Fraction of sequences that should contain anomalies (between 0 and 1).anomaly_type (
{'spike', 'level_shift'}, default'spike') –Type of anomaly to inject: -
'spike': Adds/subtracts anomaly_magnitude at a random single point. -'level_shift': Adds/subtracts anomaly_magnitude to all pointsafter a random point in the sequence.
anomaly_magnitude (
float, default5.0) – The magnitude (absolute value) of the injected anomaly. The sign (add or subtract) is chosen randomly.noise_level (
float, default0.2) – Standard deviation of Gaussian noise added to the base signal.as_frame (
bool, defaultFalse) –Determines return type: - If
False(default): Returns a tuple (sequences, labels)where sequences is a NumPy array (N, T, F) and labels is (N,).
If
True: Attempts to create a DataFrame and returns a Bunch object (less standard for sequence data).
seed (
int, optional) – Seed for NumPy’s random number generator for reproducibility. Default is None.
- Returns:
data – If
as_frame=False(default): Tuple (sequences, labels):sequences : ndarray of shape (n_sequences, sequence_length, n_features)
labels : ndarray of shape (n_sequences,) with 0 (normal) or 1 (anomaly).
If
as_frame=True: A Bunch object containing a DataFrame (frame - potentially very wide if sequences flattened), labels, feature_names, etc. Or just the DataFrame if preferred (structure TBD). Note: Returning sequences as a DataFrame can be awkward.- Return type:
tupleorBunchorpandas.DataFrame- Raises:
ValueError – If n_features is not 1 (currently only supports univariate). If anomaly_fraction is not between 0 and 1. If anomaly_type is invalid.
Examples
>>> from fusionlab.datasets import make_anomaly_data >>> # Generate sequences and labels as NumPy arrays >>> sequences, labels = make_anomaly_data(n_sequences=50, anomaly_fraction=0.2, seed=42) >>> print(f"Generated sequences shape: {sequences.shape}") >>> print(f"Generated labels shape: {labels.shape}") >>> print(f"Number of anomalies: {np.sum(labels)}")