Paper ReviewMathematics & StatisticsMachine/Deep Learning

Functional Data Analysis Meets LLMs: Treating Time Series as Mathematical Functions

Standard time series methods treat observations as discrete points. Functional data analysis treats them as samples from continuous curvesโ€”unlocking mathematical tools from functional analysis (Hilbert spaces, basis expansions, functional PCA) that capture temporal structure more faithfully.

By Sean K.S. Shin
This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

Time series data is traditionally analyzed as sequences of discrete observations: values at times tโ‚, tโ‚‚, ..., tโ‚™. This discrete representation is convenient for computation but discards an important structure: the underlying process that generated the observations is continuous. A sensor reading at 10:00:01 is not independent of the reading at 10:00:02โ€”they are samples from a smooth underlying function that varies continuously in time.

Functional data analysis (FDA) embraces this continuity. Rather than treating observations as vectors of numbers, FDA treats each time series as a sample from a space of functionsโ€”an element of an infinite-dimensional function space (typically a Hilbert space). This perspective unlocks mathematical toolsโ€”functional principal component analysis, functional regression, reproducing kernel Hilbert spacesโ€”that capture temporal structure more faithfully than their finite-dimensional counterparts.

Sun et al.'s FDALLM+ demonstrates a novel integration: using FDA to preprocess time series data before feeding it to large language models for prediction. Ju & Lee evaluate FDA-based feature extraction for manufacturing quality prediction. Together, they illustrate how FDA bridges the gap between the rich mathematical theory of function spaces and the practical needs of modern prediction systems.

Why Functions, Not Vectors?

The functional perspective offers three advantages over the vector perspective:

Smoothness constraints: Real physical processes are typically smoothโ€”temperature doesn't jump discontinuously, network traffic doesn't teleport between values. FDA enforces smoothness through basis expansion (representing each curve as a weighted combination of smooth basis functions like B-splines or Fourier series), eliminating measurement noise while preserving genuine signal structure.

Phase variation: Two time series may represent the same underlying pattern shifted in timeโ€”a daily traffic pattern that peaks at 9am in one city and 10am in another. Standard vector methods treat these as different signals; FDA's registration techniques align them, revealing the common pattern.

Dimension reduction: Functional PCA extracts the principal modes of variation in a collection of curvesโ€”the dominant shapes that explain most of the variability. For network traffic, these modes might represent daily patterns, weekly cycles, and anomalous events. Each curve is then represented by a small number of functional PC scores rather than hundreds of discrete time points.

FDALLM+: Functions as LLM Input

Sun et al.'s FDALLM+ uses FDA preprocessing to transform raw network traffic time series into functional representations before feeding them to a large language model for prediction. The pipeline:

  • Basis expansion: Raw traffic data is represented as a weighted combination of B-spline basis functions, smoothing out measurement noise
  • Functional PCA: The dominant modes of traffic variation are extracted, compressing each time series into a compact set of scores
  • LLM prediction: The functional PC scores are tokenized and fed to an LLM that has been fine-tuned on traffic prediction tasks
  • Reconstruction: The LLM's predicted scores are mapped back to functional space and evaluated at desired time points
  • The FDA preprocessing addresses a limitation of LLMs for time series: LLMs operate on discrete tokens and do not naturally encode the continuous temporal structure of time series data. By first projecting into functional space, FDALLM+ provides the LLM with representations that already encode temporal smoothness, periodicity, and dominant variation patterns.

    Manufacturing Quality: FDA for Sensor Data

    Ju & Lee apply FDA to a different domain: semiconductor manufacturing, where sensor data from production equipment is used to predict wafer quality. Manufacturing sensor data is high-frequency, high-dimensional, and temporally structuredโ€”characteristics that make it an ideal candidate for FDA.

    Their comparison of FDA-based features versus summary statistics (mean, variance, skewness) for supervised learning shows that FDA featuresโ€”functional PC scoresโ€”capture temporal patterns (waveform shapes, transient dynamics) that summary statistics miss. For quality prediction tasks where the temporal shape of sensor readings matters more than their average value, FDA features provide measurable improvement.

    Claims and Evidence

    <
    ClaimEvidenceVerdict
    FDA captures temporal structure that discrete methods missTheoretical argument + empirical evidence from manufacturingโœ… Well-established
    FDA preprocessing improves LLM time series predictionFDALLM+ demonstrates on network trafficโœ… Supported
    Functional PCA provides effective dimension reduction for curvesCore FDA result; widely validatedโœ… Well-established
    FDA features outperform summary statistics for quality predictionJu & Lee comparative evaluation on semiconductor dataโœ… Supported
    FDA is computationally tractable for large-scale time seriesBasis expansion and FPCA are efficient; scaling to millions of curves requires careโš ๏ธ Scalable with appropriate implementation

    Open Questions

  • Basis selection: The choice of basis functions (B-splines, Fourier, wavelets) affects the functional representation. How do we select the optimal basis for a given domain? Can the basis be learned from data?
  • Irregular sampling: FDA assumes that curves are observed at regular intervals or can be smoothly interpolated. For irregularly sampled data (clinical measurements, event-driven sensors), how do we construct functional representations?
  • Functional deep learning: Can neural network architectures be designed to operate directly on function-valued inputs, without first discretizing or projecting onto a finite basis? Functional neural networks are an emerging research direction.
  • Multivariate functional data: Real systems produce multiple correlated time series (multivariate functions). How do we extend FDA to capture cross-function dependenciesโ€”for instance, the relationship between temperature and pressure curves in a manufacturing process?
  • What This Means for Your Research

    For statisticians, FDA provides a mathematically rigorous framework for time series analysis that respects the continuous nature of temporal data. The theory is mature; the applications are expanding rapidly as data collection becomes higher-frequency and more continuous.

    For ML practitioners, FDA preprocessing (basis expansion + functional PCA) provides a principled alternative to ad hoc feature engineering for time series data. The FDALLM+ integration demonstrates that this preprocessing is compatible with modern deep learning architectures.

    For domain scientists in manufacturing, telecommunications, and environmental monitoring, FDA offers interpretable features (principal modes of variation) that domain experts can inspect and relate to physical processesโ€”a transparency advantage over black-box feature extraction.

    References (2)

    [1] Sun, Y., Wang, X., Cao, G. (2025). FDALLM+: A Functional Data Analysis-Driven Large-Language Model Framework for Network Traffic Prediction. IEEE Journal of IoT.
    [2] Ju, Y. & Lee, Y. (2025). Performance Evaluation of Supervised Learning Model Based on Functional Data Analysis and Summary Statistics. IEEE TSM.

    Explore this topic deeper

    Search 290M+ papers, detect research gaps, and find what hasn't been studied yet.

    Click to remove unwanted keywords

    Search 8 keywords โ†’