Glossary¶

This glossary defines key terms, abbreviations, and concepts used throughout the fusionlab-learn library documentation and the broader domain of time series forecasting and deep learning.

ACF (Autocorrelation Function)¶: A function measuring the correlation between a time series and lagged versions of itself. Used to identify seasonality, memory, and autoregressive patterns. See ts_corr_analysis().
Additive Decomposition¶: A model for time series decomposition where the components are summed: \(Y_t = Trend_t + Seasonal_t + Residual_t\). Often suitable when seasonality/residuals do not scale with the trend. See decompose_ts().
ADF Test (Augmented Dickey-Fuller Test)¶: A statistical test for Stationarity. The null hypothesis assumes the series has a unit root (is non-stationary). A low p-value suggests stationarity. See trend_analysis().
Anomaly Detection¶: The process of identifying data points, events, or patterns that deviate significantly from expected or normal behavior. See the Anomaly Detection guide.
Anomaly Score¶: A numerical value indicating the degree to which a data point or sequence is considered anomalous. Higher scores typically represent greater abnormality. See compute_anomaly_scores().
Attention Mechanism¶: A technique allowing neural networks to dynamically weigh the importance of different input parts (e.g., time steps, features) when forming representations or predictions. Includes variants like Self-Attention, Cross-Attention, Multi-Head Attention. See the Attention Mechanisms section.
Autocorrelation¶: The correlation of a time series with lagged versions of itself. See ACF (Autocorrelation Function).
Autoencoder¶: A neural network trained for unsupervised reconstruction of its input, often via a compressed Latent Space. High reconstruction error can indicate anomalies. See LSTMAutoencoderAnomaly.
Autoregressive Model (AR Model)¶: A time series model where the current value is predicted based on a linear combination of its own past values (lags).
Backtesting¶: Evaluating a forecasting model’s performance on historical data by simulating its past usage, often with rolling or expanding windows to respect temporal order. See Time Series Cross-Validation.
Batch Normalization¶: A normalization technique applied across the batch dimension, typically after a linear layer and before activation, to stabilize training by normalizing layer inputs. Compare with Layer Normalization.
Batch Size¶: The number of samples processed together in one forward/backward pass during model training or inference.
Bunch¶: A utility class (similar to a dictionary but allowing attribute-style access) used by fusionlab-learn dataset loaders to return data and metadata. See Bunch.
Categorical Feature¶: A feature whose values belong to a finite set of discrete categories (e.g., ‘product_type’, ‘day_of_week’). Often requires One-Hot Encoding or Embedding before use in models.
CLI (Command-Line Interface)¶: A text-based interface for interacting with software via commands entered in a terminal. See Command-Line Tools.
Continuous Feature¶: A feature whose values can take any number within a range (e.g., ‘temperature’, ‘sales_amount’). Often requires Scaling.
Coverage Score¶: A metric evaluating probabilistic forecasts (prediction intervals). It measures the proportion of actual values falling within the predicted interval (e.g., between the 0.1 and 0.9 quantiles).
Cross-Attention¶: Attention where one sequence (query) attends to a different sequence (key/value), modeling interactions between distinct inputs. See CrossAttention.
Cross-Validation (CV)¶: See Time Series Cross-Validation.
Decomposition¶: Separating a time series into constituent components like Trend, Seasonality, and Residuals. See decompose_ts().
Detrending¶: Removing the trend component from a time series. See transform_stationarity().
Differencing¶: Transforming a time series by subtracting previous values, e.g., \(Y'_t = Y_t - Y_{t-d}\). Used to achieve Stationarity. See transform_stationarity().
Dynamic Features¶: Features whose values change over time (e.g., past sales, weather, day-of-week). Used as historical inputs in TFT/XTFT.
Dynamic Time Window¶: A component that adaptively selects or weights the most recent time steps from a sequence. See DynamicTimeWindow.
Early Stopping¶: A regularization technique stopping training early if validation performance plateaus or degrades, preventing overfitting.
Embedding¶: A learned, dense vector representation, typically of lower dimension, used to represent discrete inputs (like categories) or project continuous inputs. See MultiModalEmbedding.
Epoch¶: One complete pass through the entire training dataset.
Exogenous Variables¶: External variables influencing the target variable but not influenced by it (e.g., weather affecting sales). Often used as Future Features if known in advance.
Feature Engineering¶: The process of creating new input features from raw data to improve model performance. See ts_engineering().
Forecast Horizon¶: The number of future time steps (\(H\)) for which predictions are generated.
Fourier Features / Transform¶: Features derived from the Discrete Fourier Transform (DFT or FFT), representing the magnitude/phase of different frequency components. Useful for capturing complex periodicities. See ts_engineering().
Future Features (Known Covariates)¶: Features whose values are known in advance for future time steps at the time of prediction (e.g., holidays, promotions, day-of-week). Leveraged by TFT/XTFT.
Gate / Gating Mechanism¶: A component in neural networks (often using sigmoid activation) that controls the flow of information through a layer, allowing the network to dynamically adjust computations. See GLU and GRN.
GLU (Gated Linear Unit)¶: A specific gating mechanism, often \(a \odot \sigma(b)\), where \(a\) and \(b\) are linear transformations of an input, \(\odot\) is element-wise multiplication, and \(\sigma\) is sigmoid. Used within GRN.
GRN (Gated Residual Network)¶: A core component combining linear transformations, non-linear activation, gating (GLU), and a residual connection with layer normalization. Enables complex, stable transformations. See GatedResidualNetwork.
Heuristic¶: A practical rule or method, often based on experience, used when an optimal algorithm is impractical (e.g., heuristic choice of decomposition model).
Hierarchical Attention¶: An attention mechanism designed to process inputs at multiple levels or scales, potentially capturing relationships within and between different temporal resolutions or feature groups. See HierarchicalAttention.
Hyperparameter¶: A parameter set before training begins, controlling model architecture or the learning process (e.g., learning rate, hidden_units). Contrast with model weights learned during training.
Hyperparameter Tuning / Optimization¶: The process of searching for the optimal set of hyperparameters to maximize model performance. See /user_guide/forecast_tuner.
Interpretability¶: The degree to which a model’s predictions and internal workings can be understood by humans. TFT/XTFT incorporate components like VSNs and attention to enhance interpretability.
IQR (Interquartile Range)¶: A measure of statistical dispersion (\(Q3 - Q1\)). Used in robust outlier detection. See ts_outlier_detector().
Keras¶: A high-level API for building and training neural networks, commonly used with backends like TensorFlow, JAX, or PyTorch. fusionlab currently uses the Keras API provided by TensorFlow.
Keras Tuner¶: A library for automating hyperparameter tuning for Keras models. See /user_guide/forecast_tuner.
KPSS Test (Kwiatkowski-Phillips-Schmidt-Shin Test)¶: A statistical test for Stationarity. The null hypothesis is stationarity around a deterministic trend. A low p-value suggests non-stationarity. See trend_analysis().
Lag Features¶: Features created by shifting a time series (\(Y_{t-k}\)). See create_lag_features().
Latent Space / Representation¶: A typically lower-dimensional space capturing salient features, learned by encoding high-dimensional data. Used in Autoencoder.
Layer Normalization¶: Normalization applied across features for a single sample, often used in Transformers and GRNs. Contrast with Batch Normalization.
Learned Normalization¶: Normalization using learned scale and shift parameters instead of pre-calculated statistics. See LearnedNormalization.
LOESS (Locally Estimated Scatterplot Smoothing)¶: A non-parametric regression method fitting smooth curves locally. Used internally by STL.
Lookback Period / Window¶: The number of past time steps (\(T\) or time_steps) used as input to predict the future.
Loss Function¶: A function measuring the discrepancy between model predictions and true values, guiding model training via optimization. See Loss Functions.
LSTM (Long Short-Term Memory)¶: A type of Recurrent Neural Network (RNN) adept at learning long-range dependencies in sequences. See MultiScaleLSTM.
MAE (Mean Absolute Error)¶: An evaluation metric: mean of absolute differences between predictions and actuals.
Memory-Augmented Attention¶: Attention mechanism incorporating an external, trainable memory matrix, allowing the model to potentially access longer-term or learned contextual information. See MemoryAugmentedAttention.
MinMaxScaler¶: A scikit-learn scaler that transforms features to a specific range, typically [0, 1].
MSE (Mean Squared Error)¶: A common loss function/metric: mean of squared differences between predictions and actuals.
Multi-Head Attention¶: Attention performed multiple times in parallel using different projections (heads), allowing focus on different representation subspaces. See Attention Mechanisms.
Multi-Horizon Forecasting¶: Predicting multiple future time steps simultaneously. Requires forecast_horizon > 1.
Multi-Modal Embedding¶: A layer that projects multiple input sequences (modalities) into a common embedding space before combining them. See MultiModalEmbedding.
Multi-Resolution Attention Fusion¶: A self-attention layer applied to features combined from various sources (e.g., multi-scale LSTMs, different attention outputs) to create a unified representation. See MultiResolutionAttentionFusion.
Multi-Scale Processing¶: Analyzing data at different temporal resolutions simultaneously. See MultiScaleLSTM.
Multi-Target Forecasting¶: Predicting multiple related target variables simultaneously. See make_multivariate_target_data().
Multiplicative Decomposition¶: Time series decomposition where components are multiplied: \(Y_t = T_t \times S_t \times R_t\). See decompose_ts().
Multivariate Time Series¶: A time series consisting of observations on multiple variables over time.
NTemporalFusionTransformer¶: A fusionlab-learn variant of TFT requiring static and dynamic inputs, currently focused on point forecasts. See NTemporalFusionTransformer.
NumPy Style Docstrings¶: A convention for formatting Python docstrings using specific sections (Parameters, Returns, etc.). Used by fusionlab and parsed by sphinx.ext.napoleon.
One-Hot Encoding¶: Converting categorical integer features into binary vectors where only the element corresponding to the category is 1.
Outlier¶: A data point significantly different from other observations. See ts_outlier_detector().
PACF (Partial Autocorrelation Function)¶: Measures correlation between a series and its lag, after removing effects of intermediate lags. Helps identify AR order. See ts_corr_analysis().
Pinball Loss¶: See Quantile Loss.
Point Forecast¶: A single value prediction for each future time step. Contrast with Quantile Forecast.
Positional Encoding¶: Technique to inject sequence order information into models like Transformers that don’t inherently process order. See PositionalEncoding.
Probabilistic Forecasting¶: Forecasting that provides uncertainty estimates, typically via quantiles or a full predictive distribution. See Quantile Forecast.
Quantile¶: A point below which a specified percentage (quantile level) of data falls (e.g., 0.5 quantile = median).
Quantile Distribution Modeling¶: The final output component in XTFT that maps decoder features to specific quantile predictions (or a point forecast). See QuantileDistributionModeling.
Quantile Forecast¶: Predicting specific quantiles (e.g., 0.1, 0.5, 0.9) of the target variable’s future distribution to represent uncertainty.
Quantile Loss (Pinball Loss)¶: Loss function for training quantile forecasting models, penalizing errors asymmetrically based on the quantile level. See combined_quantile_loss().
Recurrent Neural Network (RNN)¶: A class of neural networks designed for sequential data, containing feedback loops (e.g., LSTM, GRU).
Residual¶: The component of a time series remaining after Trend and Seasonality have been removed, or the error between predictions and actuals.
Rolling Statistics / Window¶: Statistics (mean, std dev) calculated over a sliding window. See ts_engineering().
R² Score (Coefficient of Determination)¶: Statistical measure (\(R^2\)) of the proportion of variance in the dependent variable predictable from independent variables.
Scaler¶: Tool (e.g., StandardScaler, MinMaxScaler) for feature Scaling.
Scaling¶: Transforming numerical features to a common scale (e.g., [0, 1] or mean 0, std 1) for better model training.
Scikit-learn¶: A popular Python library for machine learning, providing tools for preprocessing, model selection, evaluation, and various algorithms. Used by some fusionlab-learn utilities.
SDT (Seasonal Decomposition of Time series)¶: Classical time series decomposition method (additive/multiplicative). See decompose_ts().
Seasonality¶: Patterns repeating over a fixed period (daily, weekly, yearly).
Self-Attention¶: Attention mechanism where a sequence attends to itself to model internal relationships.
Sequence Length¶: See Lookback Period / Window.
Sequence-to-Sequence (Seq2Seq) Model¶: Architecture mapping an input sequence to an output sequence.
Spatiotemporal Data¶: Data that has both spatial (location) and temporal (time) dimensions.
StandardScaler¶: Scikit-learn scaler standardizing features to zero mean and unit variance (Z-score).
Static Features¶: Time-invariant features associated with a series (e.g., sensor ID, location category). Used as context by TFT/XTFT.
Stationarity¶: Property where a time series’ statistical properties (mean, variance, autocorrelation) are constant over time.
Statsmodels¶: A Python library providing classes and functions for estimating many different statistical models, as well as statistical tests and data exploration. Used by some fusionlab-learn utilities.
STL (Seasonal-Trend decomposition using LOESS)¶: Robust time series decomposition method. See decompose_ts().
Supervised Learning¶: Machine learning where a model learns a mapping from inputs to outputs using labeled examples. Forecasting is often framed this way.
SuperXTFT¶: An experimental, enhanced version of XTFT with input VSNs and extra GRNs. See SuperXTFT.
Taylor Diagram¶: A diagram used to graphically summarize how well patterns match each other in terms of correlation, standard deviation, and RMSE. (Mentioned in relation to k-diagram)
TensorFlow¶: Open-source machine learning framework used as the backend for fusionlab-learn neural network models.
TFT (Temporal Fusion Transformer)¶: Baseline interpretable deep learning architecture for multi-horizon time series forecasting. See TemporalFusionTransformer and TFT.
Time Series¶: A sequence of data points indexed in time order.
Time Series Cross-Validation¶: Cross-validation respecting temporal order, typically using expanding or rolling forecast origins. See ts_split() (split_type=’cv’).
Trend¶: The long-term increase or decrease in a time series.
Uncertainty Quantification (UQ)¶: The process of estimating and characterizing the uncertainty associated with model predictions, often via prediction intervals or full distributions. Quantile forecasting is a method for UQ.
Univariate Time Series¶: A time series with observations on only a single variable.
VSN (Variable Selection Network)¶: Component that learns importance weights for input features. See VariableSelectionNetwork.
XTFT (Extreme Temporal Fusion Transformer)¶: Enhanced TFT variant with multi-scale processing, advanced attention, and anomaly detection. See XTFT.
Z-Score¶: Statistical measure of a value’s deviation from the mean in units of standard deviations. See ts_outlier_detector().