FOLDS seminar (new!)
Foundations of
Optimization,
Learning, and
Data
Science
Quick logistics
- When: Thursdays 12-1pm
- Where: Amy Gutmann Hall 414
- Mailing list:
subscribe here to "folds-seminar"
- Add to calendar:
subscribe
here ▾
- Organizers: Jason Altschuler and Hamed Hassani
- Administrative coordinator: Sonia Castro (email: soniacr@seas.upenn.edu)
About
- What: This seminar features leading experts in optimization, learning, and data science. Topics span algorithms, complexity, modeling, applications, and mathematical underpinnings.
- Why: Foundational advances in these fields are increasingly intertwined. This seminar serves as a university-wide hub to bring together the many communities across UPenn interested in optimization, learning, and data science — in the Department of Statistics and Data Science, Electrical Engineering, Computer Science, Applied Mathematics, Economics, Wharton OID, etc. To help promote this internal interaction, several speakers will be from UPenn.
- Funding: We are grateful to the IDEAS Center for Innovation in Data Engineering and Science, Penn AI, and the Wharton Department of Statistics and Data Science.
- Archived schedules: Fall 2025. (Previously: this seminar series evolved from the UPenn Optimization seminar Fall 2023, Spring 2024, Fall 2024, Spring 2025)
Schedule for Spring 2026
Abstracts
Elad Hazan: Provably Efficient Learning in Nonlinear Dynamical Systems via Spectral Transformers
Learning in dynamical systems is a fundamental challenge underlying modern sequence modeling. Despite extensive study, efficient algorithms with formal guarantees for general nonlinear systems have remained elusive. This talk presents a provably efficient framework for learning in any bounded and Lipschitz nonlinear dynamical system, establishing the first sublinear regret guarantees in a dimension-free setting. Our approach combines Koopman lifting, Luenberger observers, and, crucially, spectral filtering to show that nonlinear dynamics are learnable. These insights motivate a new neural architecture, the Spectral Transform Unit (STU), which achieves state-of-the-art performance on language modeling, dynamical system, and differential equation benchmark.
Jiaoyang Huang: Fast Convergence of High-Order ODE Solvers for Diffusion Models
Score-based diffusion models can be sampled efficiently by reformulating the reverse dynamics as a deterministic probability flow ODE and integrating it with high-order solvers. Since the score function is typically approximated by a neural network, the overall sampling accuracy depends on the interplay between score regularity, approximation error, and numerical integration error. In this talk, we study the convergence of deterministic probability-flow-ODE samplers, focusing on high-order (exponential) Runge–Kutta schemes. Under mild regularity assumptions—specifically, bounded first and second derivatives of the approximate score—we bound the total variation distance between the target distribution and the generated distribution by the sum of a score-approximation term and a p-th order step-size term, explaining why accurate sampling is achievable with only a few solver steps. We also empirically validate the regularity assumptions on benchmark datasets. Our guarantees apply to general forward diffusion processes with arbitrary variance schedules.
Yuxin Chen: Transformers Meet In-Context Learning: A Universal Approximation Theory
Large language models are capable of in-context learning, the ability to perform new tasks at test time using a handful of input-output examples, without parameter updates. We develop a universal approximation theory to elucidate how transformers enable in-context learning. For a general class of functions (each representing a distinct task), we demonstrate how to construct a transformer that, without any further weight updates, can predict based on a few noisy in-context examples with vanishingly small risk. Unlike prior work that frames transformers as approximators of optimization algorithms (e.g., gradient descent) for statistical learning tasks, we integrate Barron's universal function approximation theory with the algorithm approximator viewpoint. Our approach yields approximation guarantees that are not constrained by the effectiveness of the optimization algorithms being mimicked, extending far beyond convex problems like linear regression. The key is to show that (i) any target function can be nearly linearly represented, with small ℓ1-norm, over a set of universal features, and (ii) a transformer can be constructed to find the linear representation -- akin to solving Lasso -- at test time.
This is joint work with Gen Li, Yuchen Jiao, Yu Huang, and Yuting Wei (arXiv:2506.05200).