FOLDS seminar (new!)
Foundations of Optimization, Learning, and Data Science


Quick logistics


About




Schedule for Spring 2026

Date Speaker Affiliation Title
Feb 5 Elad Hazan Princeton / Google Provably Efficient Learning in Nonlinear Dynamical Systems via Spectral Transformers
Feb 12 Jiaoyang Huang Penn Fast Convergence of High-Order ODE Solvers for Diffusion Models
Feb 19 Yuxin Chen Penn Transformers Meet In-Context Learning: A Universal Approximation Theory
Feb 26 Daniel Hsu Columbia
Mar 5 Paris Perdikaris Penn
Mar 12 [Spring Break] ------------ ------------
Mar 19 Mehryar Mohri NYU / Google
Mar 26 Rachel Cummings Columbia
Apr 2 Maryam Fazel UW
Apr 9 Weijie Su Penn
Apr 15 (joint with ASSET seminar) Misha Belkin UCSD
Apr 23 Shivani Agarwal Penn





Abstracts

Elad Hazan: Provably Efficient Learning in Nonlinear Dynamical Systems via Spectral Transformers

Learning in dynamical systems is a fundamental challenge underlying modern sequence modeling. Despite extensive study, efficient algorithms with formal guarantees for general nonlinear systems have remained elusive. This talk presents a provably efficient framework for learning in any bounded and Lipschitz nonlinear dynamical system, establishing the first sublinear regret guarantees in a dimension-free setting. Our approach combines Koopman lifting, Luenberger observers, and, crucially, spectral filtering to show that nonlinear dynamics are learnable. These insights motivate a new neural architecture, the Spectral Transform Unit (STU), which achieves state-of-the-art performance on language modeling, dynamical system, and differential equation benchmark.


Jiaoyang Huang: Fast Convergence of High-Order ODE Solvers for Diffusion Models

Score-based diffusion models can be sampled efficiently by reformulating the reverse dynamics as a deterministic probability flow ODE and integrating it with high-order solvers. Since the score function is typically approximated by a neural network, the overall sampling accuracy depends on the interplay between score regularity, approximation error, and numerical integration error. In this talk, we study the convergence of deterministic probability-flow-ODE samplers, focusing on high-order (exponential) Runge–Kutta schemes. Under mild regularity assumptions—specifically, bounded first and second derivatives of the approximate score—we bound the total variation distance between the target distribution and the generated distribution by the sum of a score-approximation term and a p-th order step-size term, explaining why accurate sampling is achievable with only a few solver steps. We also empirically validate the regularity assumptions on benchmark datasets. Our guarantees apply to general forward diffusion processes with arbitrary variance schedules.


Yuxin Chen: Transformers Meet In-Context Learning: A Universal Approximation Theory

Large language models are capable of in-context learning, the ability to perform new tasks at test time using a handful of input-output examples, without parameter updates. We develop a universal approximation theory to elucidate how transformers enable in-context learning. For a general class of functions (each representing a distinct task), we demonstrate how to construct a transformer that, without any further weight updates, can predict based on a few noisy in-context examples with vanishingly small risk. Unlike prior work that frames transformers as approximators of optimization algorithms (e.g., gradient descent) for statistical learning tasks, we integrate Barron's universal function approximation theory with the algorithm approximator viewpoint. Our approach yields approximation guarantees that are not constrained by the effectiveness of the optimization algorithms being mimicked, extending far beyond convex problems like linear regression. The key is to show that (i) any target function can be nearly linearly represented, with small ℓ1-norm, over a set of universal features, and (ii) a transformer can be constructed to find the linear representation -- akin to solving Lasso -- at test time.  This is joint work with Gen Li, Yuchen Jiao, Yu Huang, and Yuting Wei (arXiv:2506.05200). 

IDEAS Logo Penn AI Logo STAT Logo