Glossary

This glossary defines key terms, abbreviations, and concepts used throughout the fusionlab-learn library documentation and the broader domain of time series forecasting and deep learning.

ACF (Autocorrelation Function)

A function measuring the correlation between a time series and lagged versions of itself. Used to identify seasonality, memory, and autoregressive patterns. See ts_corr_analysis().

Additive Decomposition

A model for time series decomposition where the components are summed: \(Y_t = Trend_t + Seasonal_t + Residual_t\). Often suitable when seasonality/residuals do not scale with the trend. See decompose_ts().

ADF Test (Augmented Dickey-Fuller Test)

A statistical test for Stationarity. The null hypothesis assumes the series has a unit root (is non-stationary). A low p-value suggests stationarity. See trend_analysis().

Anomaly Detection

The process of identifying data points, events, or patterns that deviate significantly from expected or normal behavior. See the Anomaly Detection guide.

Anomaly Score

A numerical value indicating the degree to which a data point or sequence is considered anomalous. Higher scores typically represent greater abnormality. See compute_anomaly_scores().

Attention Mechanism

A technique allowing neural networks to dynamically weigh the importance of different input parts (e.g., time steps, features) when forming representations or predictions. Includes variants like Self-Attention, Cross-Attention, Multi-Head Attention. See the Attention Mechanisms section.

Autocorrelation

The correlation of a time series with lagged versions of itself. See ACF (Autocorrelation Function).

Autoencoder

A neural network trained for unsupervised reconstruction of its input, often via a compressed Latent Space. High reconstruction error can indicate anomalies. See LSTMAutoencoderAnomaly.

Autoregressive Model (AR Model)

A time series model where the current value is predicted based on a linear combination of its own past values (lags).

Backtesting

Evaluating a forecasting model’s performance on historical data by simulating its past usage, often with rolling or expanding windows to respect temporal order. See Time Series Cross-Validation.

Batch Normalization

A normalization technique applied across the batch dimension, typically after a linear layer and before activation, to stabilize training by normalizing layer inputs. Compare with Layer Normalization.

Batch Size

The number of samples processed together in one forward/backward pass during model training or inference.

Bunch

A utility class (similar to a dictionary but allowing attribute-style access) used by fusionlab-learn dataset loaders to return data and metadata. See Bunch.

Categorical Feature

A feature whose values belong to a finite set of discrete categories (e.g., ‘product_type’, ‘day_of_week’). Often requires One-Hot Encoding or Embedding before use in models.

CLI (Command-Line Interface)

A text-based interface for interacting with software via commands entered in a terminal. See Command-Line Tools.

Continuous Feature

A feature whose values can take any number within a range (e.g., ‘temperature’, ‘sales_amount’). Often requires Scaling.

Coverage Score

A metric evaluating probabilistic forecasts (prediction intervals). It measures the proportion of actual values falling within the predicted interval (e.g., between the 0.1 and 0.9 quantiles).

Cross-Attention

Attention where one sequence (query) attends to a different sequence (key/value), modeling interactions between distinct inputs. See CrossAttention.

Cross-Validation (CV)

See Time Series Cross-Validation.

Decomposition

Separating a time series into constituent components like Trend, Seasonality, and Residuals. See decompose_ts().

Detrending

Removing the trend component from a time series. See transform_stationarity().

Differencing

Transforming a time series by subtracting previous values, e.g., \(Y'_t = Y_t - Y_{t-d}\). Used to achieve Stationarity. See transform_stationarity().

Dynamic Features

Features whose values change over time (e.g., past sales, weather, day-of-week). Used as historical inputs in TFT/XTFT.

Dynamic Time Window

A component that adaptively selects or weights the most recent time steps from a sequence. See DynamicTimeWindow.

Early Stopping

A regularization technique stopping training early if validation performance plateaus or degrades, preventing overfitting.

Embedding

A learned, dense vector representation, typically of lower dimension, used to represent discrete inputs (like categories) or project continuous inputs. See MultiModalEmbedding.

Epoch

One complete pass through the entire training dataset.

Exogenous Variables

External variables influencing the target variable but not influenced by it (e.g., weather affecting sales). Often used as Future Features if known in advance.

Feature Engineering

The process of creating new input features from raw data to improve model performance. See ts_engineering().

Forecast Horizon

The number of future time steps (\(H\)) for which predictions are generated.

Fourier Features / Transform

Features derived from the Discrete Fourier Transform (DFT or FFT), representing the magnitude/phase of different frequency components. Useful for capturing complex periodicities. See ts_engineering().

Future Features (Known Covariates)

Features whose values are known in advance for future time steps at the time of prediction (e.g., holidays, promotions, day-of-week). Leveraged by TFT/XTFT.

Gate / Gating Mechanism

A component in neural networks (often using sigmoid activation) that controls the flow of information through a layer, allowing the network to dynamically adjust computations. See GLU and GRN.

GLU (Gated Linear Unit)

A specific gating mechanism, often \(a \odot \sigma(b)\), where \(a\) and \(b\) are linear transformations of an input, \(\odot\) is element-wise multiplication, and \(\sigma\) is sigmoid. Used within GRN.

GRN (Gated Residual Network)

A core component combining linear transformations, non-linear activation, gating (GLU), and a residual connection with layer normalization. Enables complex, stable transformations. See GatedResidualNetwork.

Heuristic

A practical rule or method, often based on experience, used when an optimal algorithm is impractical (e.g., heuristic choice of decomposition model).

Hierarchical Attention

An attention mechanism designed to process inputs at multiple levels or scales, potentially capturing relationships within and between different temporal resolutions or feature groups. See HierarchicalAttention.

Hyperparameter

A parameter set before training begins, controlling model architecture or the learning process (e.g., learning rate, hidden_units). Contrast with model weights learned during training.

Hyperparameter Tuning / Optimization

The process of searching for the optimal set of hyperparameters to maximize model performance. See /user_guide/forecast_tuner.

Interpretability

The degree to which a model’s predictions and internal workings can be understood by humans. TFT/XTFT incorporate components like VSNs and attention to enhance interpretability.

IQR (Interquartile Range)

A measure of statistical dispersion (\(Q3 - Q1\)). Used in robust outlier detection. See ts_outlier_detector().

Keras

A high-level API for building and training neural networks, commonly used with backends like TensorFlow, JAX, or PyTorch. fusionlab currently uses the Keras API provided by TensorFlow.

Keras Tuner

A library for automating hyperparameter tuning for Keras models. See /user_guide/forecast_tuner.

KPSS Test (Kwiatkowski-Phillips-Schmidt-Shin Test)

A statistical test for Stationarity. The null hypothesis is stationarity around a deterministic trend. A low p-value suggests non-stationarity. See trend_analysis().

Lag Features

Features created by shifting a time series (\(Y_{t-k}\)). See create_lag_features().

Latent Space / Representation

A typically lower-dimensional space capturing salient features, learned by encoding high-dimensional data. Used in Autoencoder.

Layer Normalization

Normalization applied across features for a single sample, often used in Transformers and GRNs. Contrast with Batch Normalization.

Learned Normalization

Normalization using learned scale and shift parameters instead of pre-calculated statistics. See LearnedNormalization.

LOESS (Locally Estimated Scatterplot Smoothing)

A non-parametric regression method fitting smooth curves locally. Used internally by STL.

Lookback Period / Window

The number of past time steps (\(T\) or time_steps) used as input to predict the future.

Loss Function

A function measuring the discrepancy between model predictions and true values, guiding model training via optimization. See Loss Functions.

LSTM (Long Short-Term Memory)

A type of Recurrent Neural Network (RNN) adept at learning long-range dependencies in sequences. See MultiScaleLSTM.

MAE (Mean Absolute Error)

An evaluation metric: mean of absolute differences between predictions and actuals.

Memory-Augmented Attention

Attention mechanism incorporating an external, trainable memory matrix, allowing the model to potentially access longer-term or learned contextual information. See MemoryAugmentedAttention.

MinMaxScaler

A scikit-learn scaler that transforms features to a specific range, typically [0, 1].

MSE (Mean Squared Error)

A common loss function/metric: mean of squared differences between predictions and actuals.

Multi-Head Attention

Attention performed multiple times in parallel using different projections (heads), allowing focus on different representation subspaces. See Attention Mechanisms.

Multi-Horizon Forecasting

Predicting multiple future time steps simultaneously. Requires forecast_horizon > 1.

Multi-Modal Embedding

A layer that projects multiple input sequences (modalities) into a common embedding space before combining them. See MultiModalEmbedding.

Multi-Resolution Attention Fusion

A self-attention layer applied to features combined from various sources (e.g., multi-scale LSTMs, different attention outputs) to create a unified representation. See MultiResolutionAttentionFusion.

Multi-Scale Processing

Analyzing data at different temporal resolutions simultaneously. See MultiScaleLSTM.

Multi-Target Forecasting

Predicting multiple related target variables simultaneously. See make_multivariate_target_data().

Multiplicative Decomposition

Time series decomposition where components are multiplied: \(Y_t = T_t \times S_t \times R_t\). See decompose_ts().

Multivariate Time Series

A time series consisting of observations on multiple variables over time.

NTemporalFusionTransformer

A fusionlab-learn variant of TFT requiring static and dynamic inputs, currently focused on point forecasts. See NTemporalFusionTransformer.

NumPy Style Docstrings

A convention for formatting Python docstrings using specific sections (Parameters, Returns, etc.). Used by fusionlab and parsed by sphinx.ext.napoleon.

One-Hot Encoding

Converting categorical integer features into binary vectors where only the element corresponding to the category is 1.

Outlier

A data point significantly different from other observations. See ts_outlier_detector().

PACF (Partial Autocorrelation Function)

Measures correlation between a series and its lag, after removing effects of intermediate lags. Helps identify AR order. See ts_corr_analysis().

Pinball Loss

See Quantile Loss.

Point Forecast

A single value prediction for each future time step. Contrast with Quantile Forecast.

Positional Encoding

Technique to inject sequence order information into models like Transformers that don’t inherently process order. See PositionalEncoding.

Probabilistic Forecasting

Forecasting that provides uncertainty estimates, typically via quantiles or a full predictive distribution. See Quantile Forecast.

Quantile

A point below which a specified percentage (quantile level) of data falls (e.g., 0.5 quantile = median).

Quantile Distribution Modeling

The final output component in XTFT that maps decoder features to specific quantile predictions (or a point forecast). See QuantileDistributionModeling.

Quantile Forecast

Predicting specific quantiles (e.g., 0.1, 0.5, 0.9) of the target variable’s future distribution to represent uncertainty.

Quantile Loss (Pinball Loss)

Loss function for training quantile forecasting models, penalizing errors asymmetrically based on the quantile level. See combined_quantile_loss().

Recurrent Neural Network (RNN)

A class of neural networks designed for sequential data, containing feedback loops (e.g., LSTM, GRU).

Residual

The component of a time series remaining after Trend and Seasonality have been removed, or the error between predictions and actuals.

Rolling Statistics / Window

Statistics (mean, std dev) calculated over a sliding window. See ts_engineering().

R² Score (Coefficient of Determination)

Statistical measure (\(R^2\)) of the proportion of variance in the dependent variable predictable from independent variables.

Scaler

Tool (e.g., StandardScaler, MinMaxScaler) for feature Scaling.

Scaling

Transforming numerical features to a common scale (e.g., [0, 1] or mean 0, std 1) for better model training.

Scikit-learn

A popular Python library for machine learning, providing tools for preprocessing, model selection, evaluation, and various algorithms. Used by some fusionlab-learn utilities.

SDT (Seasonal Decomposition of Time series)

Classical time series decomposition method (additive/multiplicative). See decompose_ts().

Seasonality

Patterns repeating over a fixed period (daily, weekly, yearly).

Self-Attention

Attention mechanism where a sequence attends to itself to model internal relationships.

Sequence Length

See Lookback Period / Window.

Sequence-to-Sequence (Seq2Seq) Model

Architecture mapping an input sequence to an output sequence.

Spatiotemporal Data

Data that has both spatial (location) and temporal (time) dimensions.

StandardScaler

Scikit-learn scaler standardizing features to zero mean and unit variance (Z-score).

Static Features

Time-invariant features associated with a series (e.g., sensor ID, location category). Used as context by TFT/XTFT.

Stationarity

Property where a time series’ statistical properties (mean, variance, autocorrelation) are constant over time.

Statsmodels

A Python library providing classes and functions for estimating many different statistical models, as well as statistical tests and data exploration. Used by some fusionlab-learn utilities.

STL (Seasonal-Trend decomposition using LOESS)

Robust time series decomposition method. See decompose_ts().

Supervised Learning

Machine learning where a model learns a mapping from inputs to outputs using labeled examples. Forecasting is often framed this way.

SuperXTFT

An experimental, enhanced version of XTFT with input VSNs and extra GRNs. See SuperXTFT.

Taylor Diagram

A diagram used to graphically summarize how well patterns match each other in terms of correlation, standard deviation, and RMSE. (Mentioned in relation to k-diagram)

TensorFlow

Open-source machine learning framework used as the backend for fusionlab-learn neural network models.

TFT (Temporal Fusion Transformer)

Baseline interpretable deep learning architecture for multi-horizon time series forecasting. See TemporalFusionTransformer and TFT.

Time Series

A sequence of data points indexed in time order.

Time Series Cross-Validation

Cross-validation respecting temporal order, typically using expanding or rolling forecast origins. See ts_split() (split_type=’cv’).

Trend

The long-term increase or decrease in a time series.

Uncertainty Quantification (UQ)

The process of estimating and characterizing the uncertainty associated with model predictions, often via prediction intervals or full distributions. Quantile forecasting is a method for UQ.

Univariate Time Series

A time series with observations on only a single variable.

VSN (Variable Selection Network)

Component that learns importance weights for input features. See VariableSelectionNetwork.

XTFT (Extreme Temporal Fusion Transformer)

Enhanced TFT variant with multi-scale processing, advanced attention, and anomaly detection. See XTFT.

Z-Score

Statistical measure of a value’s deviation from the mean in units of standard deviations. See ts_outlier_detector().