UPenn Optimization Seminar


Quick facts



About




Schedule for Spring 2025

Date Speaker Affiliation Title
Jan 23 Aaron Roth UPenn Conditional Calibration for Task Specific Uncertainty Quantification
Jan 30 Mengdi Wang Princeton Controlled Generation for Large Foundation Models
Feb 6 Shirin Saeedi Bidokhti UPenn Learning-Based Data Compression: Fundamental limits and Algorithms
Feb 13 Alex Damian Princeton Foundations of Deep Learning: Optimization and Representation Learning
Feb 20 Noah Golowich MIT
Mar 6 Bartolomeo Stellato Princeton
Mar 20 Pratik Chaudhari UPenn
Mar 27 Dan Roy Toronto
Apr 3 Frank E. Curtis Lehigh
Apr 10 Ryan Tibshirani Berkeley Gradient Equilibrium in Online Learning
Apr 17 Angelia Nedich ASU
Apr 24 Jason Altschuler UPenn





Abstracts

Aaron Roth: Conditional Calibration for Task Specific Uncertainty Quantification

For many tasks, optimal solutions would be simple exercises if only we had direct access to some "true distribution on outcomes", conditional on all of our observed covariates. Unfortunately "true probabilities" are fundamentally impossible to come by, without making very strong modelling assumptions about the environment. On the other hand, there are various non-parametric methods for uncertainty quantification --- such as calibration and conformal prediction --- that can be used without making essentially any assumptions at all --- but their guarantees are marginal, and it is unclear what kinds of tasks they are good for. A recent line of work has given algorithms that can produce calibration guarantees that hold conditionally on any bounded set of conditioning events. This turns out to form a useful design framework for making predictions that can be used as if they were probabilities for many downstream tasks, so long as one selects the conditioning events appropriately with the downstream task in mind. We'll see three applications --- giving a method to make predictions that can be usefully consumed by many downstream decision makers, a method to make predictions that can be used to form prediction sets that are conditionally valid subject to any collection of conditioning events, and a method of making forecasts that can be used to interact with another forecaster and quickly reach agreement, recovering tractable versions of Aumann's agreement theorem.


Mengdi Wang: Controlled Generation for Large Foundation Models

Recent advances in large foundation models, such as large language models (LLMs) and diffusion models, have demonstrated impressive capabilities. However, to truly align these models with user feedback or maximize real-world objectives, it is crucial to exert control over the decoding processes, in order to steer the distribution of generated output. In this talk, we will explore methods and theory for controlled generation within LLMs and diffusion models. We will discuss various modalities or achieving this control, focusing on applications such as alignment of LLM, accelerated inference, transfer learning, and diffusion-based optimizer.


Shirin Saeedi Bidokhti: Learning-Based Data Compression: Fundamental limits and Algorithms

Data-driven methods have been the driving force of many scientific disciplines in the past decade, relying on huge amounts of empirical, experimental, and scientific data. Working with big data is impossible without data compression techniques that reduce the dimension and size of the data for storage and communication purposes and effectively denoise for efficient and accurate processing. In the past decade, learning-based compressors such as nonlinear transform coding (NTC) have shown great success in the task of compression by learning to map a high dimensional source onto its representative latent space of lower dimension using neural networks and compressing in that latent space. Despite this success, it is unknown how the rate-distortion performance of such compressors compare with the optimal limits of compression (known as the rate-distortion function) that information theory characterizes. It is also unknown how advances in the field of information theory translate to practice in the paradigm of deep learning.

In the first part of the talk, we develop neural estimation methods to compute the rate-distortion function of high dimensional real-world datasets. Using our estimate, and through experiments, we show that the rate-distortion achieved by NTC compressors are within several bits of the rate-distortion function for real-world datasets such as MNIST. We then ask if this gap can be closed using ideas in information theory. In particular, incorporating lattice coding in the latent domain, we propose lattice transform coding as a novel framework for neural compression. LTC provides significant improvement compared to the state of the art on synthetic and real-world sources.


Alex Damian: Foundations of Deep Learning: Optimization and Representation Learning

Deep learning's success stems from the ability of neural networks to automatically discover meaningful representations from raw data. In this talk, I will describe some recent insights into how optimization enables this learning process. First, I will show how optimization algorithms exhibit surprisingly rich dynamics when training neural networks, and how these complex dynamics are actually crucial to their success – enabling them to find solutions that generalize well, navigate challenging loss landscapes, and efficiently adapt to local curvature. I will then explore how optimization enables neural networks to adapt to low-dimensional structure in the data, how the geometry of the loss landscape shapes the difficulty of feature learning, and how these ideas extend to in-context learning in transformers.


Ryan Tibshirani: Gradient Equilibrium in Online Learning

We present a new perspective on online learning that we refer to as gradient equilibrium: a sequence of iterates achieves gradient equilibrium if the average of gradients of losses along the sequence converges to zero. In general, this condition is not implied by, nor implies, sublinear regret. It turns out that gradient equilibrium is achievable by standard online learning methods such as gradient descent and mirror descent with constant step sizes (rather than decaying step sizes, as is usually required for no regret). Further, as we show through examples, gradient equilibrium translates into an interpretable and meaningful property in online prediction problems spanning regression, classification, quantile estimation, and others. Notably, we show that the gradient equilibrium framework can be used to develop a debiasing scheme for black-box predictions under arbitrary distribution shift, based on simple post hoc online descent updates. We also show that post hoc gradient updates can be used to calibrate predicted quantiles under distribution shift, and that the framework leads to unbiased Elo scores for pairwise preference prediction.