FOLDS seminar (new!)
Foundations of Optimization, Learning, and Data Science


Quick logistics


About




Schedule for Fall 2025

Date Speaker Affiliation Title
Sep 4 (room AGH 306) Joel Tropp Caltech Positive random walks and positive-semidefinite random matrices
Sep 11 Rina Barber UChicago Algorithmic stability for regression and classification
Sep 18 Jong-Shi Pang USC Heaviside Composite Optimization: A new paradigm of optimization
Sep 25 Joan Bruna NYU Propagation-of-Chaos in Shallow Neural Networks beyond Logarithmic Time
Oct 2 (room AGH 306) Yury Polyanskiy MIT Theory and practice of LLM quantization
Oct 9 [Fall Break] ------------ ------------
Oct 16 Adam Klivans UT Austin A New Paradigm for Learning with Distribution Shift
Oct 23 Pratik Chaudhari UPenn
Oct 30 Nati Srebro TTIC
Nov 6 Zhimei Ren UPenn
Nov 13 Eva Tardos Cornell
Nov 20 Rob Nowak UW Madison





Abstracts

Joel Tropp: Positive random walks and positive-semidefinite random matrices

On the real line, a random walk that can only move in the positive direction is very unlikely to remain close to its starting point. After a fixed number of steps, the left tail has a Gaussian profile, under minimal assumptions. Remarkably, the same phenomenon occurs when we consider a positive random walk on the cone of positive-semidefinite matrices. After a fixed number of steps, the minimum eigenvalue is also described by a Gaussian model.

This talk introduces a new way to make this intuition rigorous. The methodology provides the solution to an open problem in computational mathematics about sparse random embeddings. The presentation is targeted at a general mathematical audience. Preprint: https://arxiv.org/abs/2501.16578


Rina Barber: Algorithmic stability for regression and classification

In a supervised learning setting, a model fitting algorithm is unstable if small perturbations to the input (the training data) can often lead to large perturbations in the output (say, predictions returned by the fitted model). Algorithmic stability is a desirable property with many important implications such as generalization and robustness, but testing the stability property empirically is known to be impossible in the setting of complex black-box models. In this work, we establish that bagging any black-box regression algorithm automatically ensures that stability holds, with no assumptions on the algorithm or the data. Furthermore, we construct a new framework for defining stability in the context of classification, and show that using bagging to estimate our uncertainty about the output label will again allow stability guarantees for any black-box model. This work is joint with Jake Soloff and Rebecca Willett.


Jong-Shi Pang: Heaviside Composite Optimization: A new paradigm of optimization

A Heaviside function is an indicator function of a semi-infinite interval. A Heaviside composite function is a Heaviside function composed with a multivariate function that may be nonsnooth. This talk presents a touch of this novel class of discontinuous optimization problems that borders on continuous and discrete optimization. Among the rich applications of such problems, we mention one pertaining to classification and treatment with constraints solvable by an elementary yet novel progressive integer programming (PIP) method that can be applied broadly to many other problems, including indefinite quadratic programs and linear programs with linear complementarity constraints. Computational results demonstrate the excellent performance of the PIP idea and its superiority over a full integer programming approach.


Joan Bruna: Propagation-of-Chaos in Shallow Neural Networks beyond Logarithmic Time

The analysis of gradient-based learning of Neural Networks remains an outstanding challenge, even for the simplest shallow architectures. A powerful mathematical framework that has emerged over recent years lifts the optimization in the space of probability measures, and captures important empirical phenomena such as the 'blessing of overparametrization'. However, the resulting learning guarantees remain mostly qualitative. In this talk, we study the fluctuations between idealized mean-field dynamics and polynomially-sized networks over suitable time horizons --- the so-called Propagation of Chaos. We provide a novel analysis that goes beyond traditional Gronwall-based PoC by exploiting certain geometric properties of the optimization landscape, and apply these results to representative models such as single-index models, establishing polynomial learning guarantees. Joint work with Margalit Glasgow and Denny Wu.


Yury Polyanskiy: Theory and practice of LLM quantization

Modern LLMs process information by repeatedly applying a basic primitive of matrix multiplication. Estimates show that about 60-84% of the energy consumed by LLMs goes into memory load/store operations. How can we reduce this power consumption? Tokens start as about 16-bit integers but get mapped to vectors of floats of length in the 1000s, suggesting very low information density per dimension. Thus, unsurprisingly there has been much success in reducing precision of both weights and activations without much loss in LLM performance. In this talk we will present information-theoretic analysis of quantized representations and show how it lead us to creating NestQuant, a new SOTA algorithm for weight/KV-cache/activations (ICML'2025).


Adam Klivans: A New Paradigm for Learning with Distribution Shift

We revisit the fundamental problem of learning with distribution shift, where a learner is given labeled samples from training distribution D, unlabeled samples from test distribution D′ and is asked to output a classifier with low test error. The standard approach in this setting is to prove a generalization bound in terms of some notion of distance between D and D′. These distances, however, are difficult to compute, and this has been the main stumbling block for efficient algorithm design over the last two decades.

We sidestep this issue and define a new model called TDS learning, where a learner runs a test on the training set and is allowed to reject if this test detects distribution shift relative to a fixed output classifier. This approach leads to the first set of efficient algorithms for learning with distribution shift that do not take any assumptions on the test distribution. Finally, we discuss how our techniques have recently been used to solve longstanding problems in supervised learning with contamination.

IDEAS Logo Penn AI Logo STAT Logo