FOLDS seminar (new!)
Foundations of
Optimization,
Learning, and
Data
Science
Quick logistics
- When: Thursdays 12-1pm
- Where: Amy Gutmann Hall 414
- Mailing list:
subscribe here to "folds-seminar"
- Add to calendar:
subscribe
here ▾
- Organizers: Jason Altschuler and Hamed Hassani
- Administrative coordinator: Sonia Castro (email: soniacr@seas.upenn.edu)
About
- What: This seminar features leading experts in optimization, learning, and data science. Topics span algorithms, complexity, modeling, applications, and mathematical underpinnings.
- Why: Foundational advances in these fields are increasingly intertwined. This seminar serves as a university-wide hub to bring together the many communities across UPenn interested in optimization, learning, and data science — in the Department of Statistics and Data Science, Electrical Engineering, Computer Science, Applied Mathematics, Economics, Wharton OID, etc. To help promote this internal interaction, several speakers will be from UPenn.
- Funding: We are grateful to the IDEAS Institute for Data Science, Penn AI, and the Wharton Department of Statistics and Data Science.
- Archived schedules: This seminar series evolved from the UPenn Optimization seminar Fall 2023, Spring 2024, Fall 2024, Spring 2025
Schedule for Fall 2025
Abstracts
Joel Tropp: Positive random walks and positive-semidefinite random matrices
On the real line, a random walk that can only move in the positive direction is very unlikely to remain close to its starting point. After a fixed number of steps, the left tail has a Gaussian profile, under minimal assumptions. Remarkably, the same phenomenon occurs when we consider a positive random walk on the cone of positive-semidefinite matrices. After a fixed number of steps, the minimum eigenvalue is also described by a Gaussian model.
This talk introduces a new way to make this intuition rigorous. The methodology provides the solution to an open problem in computational mathematics about sparse random embeddings. The presentation is targeted at a general mathematical audience.
Preprint: https://arxiv.org/abs/2501.16578
Rina Barber: Algorithmic stability for regression and classification
In a supervised learning setting, a model fitting algorithm is unstable if small perturbations to the input (the training data) can often lead to large perturbations in the output (say, predictions returned by the fitted model). Algorithmic stability is a desirable property with many important implications such as generalization and robustness, but testing the stability property empirically is known to be impossible in the setting of complex black-box models. In this work, we establish that bagging any black-box regression algorithm automatically ensures that stability holds, with no assumptions on the algorithm or the data. Furthermore, we construct a new framework for defining stability in the context of classification, and show that using bagging to estimate our uncertainty about the output label will again allow stability guarantees for any black-box model. This work is joint with Jake Soloff and Rebecca Willett.
Jong-Shi Pang: Heaviside Composite Optimization: A new paradigm of optimization
A Heaviside function is an indicator function of a semi-infinite interval.
A Heaviside composite function is a Heaviside function composed with a
multivariate function that may be nonsnooth. This talk presents a touch of
this novel class of discontinuous optimization problems that borders on
continuous and discrete optimization. Among the rich applications of such
problems, we mention one pertaining to classification and treatment with
constraints solvable by an elementary yet novel progressive integer
programming (PIP) method that can be applied broadly to many other problems,
including indefinite quadratic programs and linear programs with linear
complementarity constraints. Computational results demonstrate the excellent
performance of the PIP idea and its superiority over a full integer programming approach.
Joan Bruna: Propagation-of-Chaos in Shallow Neural Networks beyond Logarithmic Time
The analysis of gradient-based learning of Neural Networks remains an outstanding challenge, even for the simplest shallow architectures.
A powerful mathematical framework that has emerged over recent years lifts the optimization in the space of probability measures, and captures important empirical phenomena such as the 'blessing of overparametrization'. However, the resulting learning guarantees remain mostly qualitative.
In this talk, we study the fluctuations between idealized mean-field dynamics and polynomially-sized networks over suitable time horizons --- the so-called Propagation of Chaos. We provide a novel analysis that goes beyond traditional Gronwall-based PoC by exploiting certain geometric properties of the optimization landscape, and apply these results to representative models such as single-index models, establishing polynomial learning guarantees.
Joint work with Margalit Glasgow and Denny Wu.
Yury Polyanskiy: Theory and practice of LLM quantization
Modern LLMs process information by repeatedly applying a basic primitive of matrix multiplication. Estimates show that about 60-84% of the energy consumed by LLMs goes into memory load/store operations. How can we reduce this power consumption? Tokens start as about 16-bit integers but get mapped to vectors of floats of length in the 1000s, suggesting very low information density per dimension. Thus, unsurprisingly there has been much success in reducing precision of both weights and activations without much loss in LLM performance. In this talk we will present information-theoretic analysis of quantized representations and show how it lead us to creating NestQuant, a new SOTA algorithm for weight/KV-cache/activations (ICML'2025).
Adam Klivans: A New Paradigm for Learning with Distribution Shift
We revisit the fundamental problem of learning with distribution shift, where a learner is given labeled samples from training distribution D, unlabeled samples from test distribution D′ and is asked to output a classifier with low test error. The standard approach in this setting is to prove a generalization bound in terms of some notion of distance between D and D′. These distances, however, are difficult to compute, and this has been the main stumbling block for efficient algorithm design over the last two decades.
We sidestep this issue and define a new model called TDS learning, where a learner runs a test on the training set and is allowed to reject if this test detects distribution shift relative to a fixed output classifier. This approach leads to the first set of efficient algorithms for learning with distribution shift that do not take any assumptions on the test distribution. Finally, we discuss how our techniques have recently been used to solve longstanding problems in supervised learning with contamination.