fusionlab.datasets.make_anomaly_data

fusionlab.datasets.make_anomaly_data(n_sequences=200, sequence_length=50, n_features=1, anomaly_fraction=0.1, anomaly_type='spike', anomaly_magnitude=5.0, noise_level=0.2, as_frame=False, seed=None)[source]

Generate sequence data with injected anomalies.

Creates a dataset of time series sequences, where a specified fraction contains synthetically generated anomalies (spikes or level shifts). It returns the sequences and corresponding binary labels (0 for normal, 1 for anomaly).

This data is useful for testing and evaluating anomaly detection algorithms like LSTMAutoencoderAnomaly or anomaly-aware training strategies.

Parameters:
  • n_sequences (int, default 200) – Total number of sequences to generate.

  • sequence_length (int, default 50) – Number of time steps in each sequence.

  • n_features (int, default 1) – Number of features for each time step. Currently supports 1.

  • anomaly_fraction (float, default 0.1) – Fraction of sequences that should contain anomalies (between 0 and 1).

  • anomaly_type ({'spike', 'level_shift'}, default 'spike') –

    Type of anomaly to inject: - 'spike': Adds/subtracts anomaly_magnitude at a random single point. - 'level_shift': Adds/subtracts anomaly_magnitude to all points

    after a random point in the sequence.

  • anomaly_magnitude (float, default 5.0) – The magnitude (absolute value) of the injected anomaly. The sign (add or subtract) is chosen randomly.

  • noise_level (float, default 0.2) – Standard deviation of Gaussian noise added to the base signal.

  • as_frame (bool, default False) –

    Determines return type: - If False (default): Returns a tuple (sequences, labels)

    where sequences is a NumPy array (N, T, F) and labels is (N,).

    • If True: Attempts to create a DataFrame and returns a Bunch object (less standard for sequence data).

  • seed (int, optional) – Seed for NumPy’s random number generator for reproducibility. Default is None.

Returns:

data – If as_frame=False (default): Tuple (sequences, labels):

  • sequences : ndarray of shape (n_sequences, sequence_length, n_features)

  • labels : ndarray of shape (n_sequences,) with 0 (normal) or 1 (anomaly).

If as_frame=True: A Bunch object containing a DataFrame (frame - potentially very wide if sequences flattened), labels, feature_names, etc. Or just the DataFrame if preferred (structure TBD). Note: Returning sequences as a DataFrame can be awkward.

Return type:

tuple or Bunch or pandas.DataFrame

Raises:

ValueError – If n_features is not 1 (currently only supports univariate). If anomaly_fraction is not between 0 and 1. If anomaly_type is invalid.

Examples

>>> from fusionlab.datasets import make_anomaly_data
>>> # Generate sequences and labels as NumPy arrays
>>> sequences, labels = make_anomaly_data(n_sequences=50, anomaly_fraction=0.2, seed=42)
>>> print(f"Generated sequences shape: {sequences.shape}")
>>> print(f"Generated labels shape: {labels.shape}")
>>> print(f"Number of anomalies: {np.sum(labels)}")