UPenn Optimization Seminar
Quick facts
- When: Thursdays, 12-1pm
- Where: Room 414, Amy Gutmann Hall
- Mailing list:
Subscribe to "opt-seminar" here
- Organizers: Jason Altschuler and Hamed Hassani
- Administrative coordinator: Sonia Castro (email: soniacr@seas.upenn.edu)
About
- What: This seminar series features leading experts in optimization and adjacent fields. Topics range broadly from the design and analysis of optimization algorithms, to the complexity of fundamental optimization tasks, to the modeling and formulation of optimization applications arising in machine learning, engineering, operations, economics, applied mathematics, etc.
- Why: This seminar serves as a university-wide hub to bring together the many optimization communities across UPenn --- in the Departments of Statistics and Data Science, Electrical Engineering, Computer Science, Applied Mathematics, Economics, Wharton OID, etc. To help promote this internal interaction, several speakers will be from UPenn.
- Funding: We gratefully acknowledge funding from the Wharton Department of Statistics and Data Science, the IDEAS Institute for Data Science, and the NSF-Tripods-I program.
- Archived schedules: Fall 2023, Spring 2024, Fall 2024
Schedule for Spring 2025
Abstracts
Aaron Roth: Conditional Calibration for Task Specific Uncertainty Quantification
For many tasks, optimal solutions would be simple exercises if only we had direct access to some "true distribution on outcomes", conditional on all of our observed covariates. Unfortunately "true probabilities" are fundamentally impossible to come by, without making very strong modelling assumptions about the environment. On the other hand, there are various non-parametric methods for uncertainty quantification --- such as calibration and conformal prediction --- that can be used without making essentially any assumptions at all --- but their guarantees are marginal, and it is unclear what kinds of tasks they are good for. A recent line of work has given algorithms that can produce calibration guarantees that hold conditionally on any bounded set of conditioning events. This turns out to form a useful design framework for making predictions that can be used as if they were probabilities for many downstream tasks, so long as one selects the conditioning events appropriately with the downstream task in mind. We'll see three applications --- giving a method to make predictions that can be usefully consumed by many downstream decision makers, a method to make predictions that can be used to form prediction sets that are conditionally valid subject to any collection of conditioning events, and a method of making forecasts that can be used to interact with another forecaster and quickly reach agreement, recovering tractable versions of Aumann's agreement theorem.
Mengdi Wang: Controlled Generation for Large Foundation Models
Recent advances in large foundation models, such as large language models (LLMs) and diffusion models, have demonstrated impressive capabilities. However, to truly align these models with user feedback or maximize real-world objectives, it is crucial to exert control over the decoding processes, in order to steer the distribution of generated output. In this talk, we will explore methods and theory for controlled generation within LLMs and diffusion models. We will discuss various modalities or achieving this control, focusing on applications such as alignment of LLM, accelerated inference, transfer learning, and diffusion-based optimizer.
Shirin Saeedi Bidokhti: Learning-Based Data Compression: Fundamental limits and Algorithms
Data-driven methods have been the driving force of many scientific
disciplines in the past decade, relying on huge amounts of empirical,
experimental, and scientific data. Working with big data is impossible
without data compression techniques that reduce the dimension and size
of the data for storage and communication purposes and effectively
denoise for efficient and accurate processing. In the past decade,
learning-based compressors such as nonlinear transform coding (NTC)
have shown great success in the task of compression by learning to map
a high dimensional source onto its representative latent space of
lower dimension using neural networks and compressing in that latent
space. Despite this success, it is unknown how the rate-distortion
performance of such compressors compare with the optimal limits of
compression (known as the rate-distortion function) that information
theory characterizes. It is also unknown how advances in the field of
information theory translate to practice in the paradigm of deep
learning.
In the first part of the talk, we develop neural estimation methods to
compute the rate-distortion function of high dimensional real-world
datasets. Using our estimate, and through experiments, we show that
the rate-distortion achieved by NTC compressors are within several
bits of the rate-distortion function for real-world datasets such as
MNIST. We then ask if this gap can be closed using ideas in
information theory. In particular, incorporating lattice coding in the
latent domain, we propose lattice transform coding as a novel
framework for neural compression. LTC provides significant improvement
compared to the state of the art on synthetic and real-world sources.
Alex Damian: Foundations of Deep Learning: Optimization and Representation Learning
Deep learning's success stems from the ability of neural networks to automatically discover meaningful representations from raw data. In this talk, I will describe some recent insights into how optimization enables this learning process. First, I will show how optimization algorithms exhibit surprisingly rich dynamics when training neural networks, and how these complex dynamics are actually crucial to their success – enabling them to find solutions that generalize well, navigate challenging loss landscapes, and efficiently adapt to local curvature. I will then explore how optimization enables neural networks to adapt to low-dimensional structure in the data, how the geometry of the loss landscape shapes the difficulty of feature learning, and how these ideas extend to in-context learning in transformers.
Ryan Tibshirani: Gradient Equilibrium in Online Learning
We present a new perspective on online learning that we refer to as gradient equilibrium: a sequence of iterates achieves gradient equilibrium if the average of gradients of losses along the sequence converges to zero. In general, this condition is not implied by, nor implies, sublinear regret. It turns out that gradient equilibrium is achievable by standard online learning methods such as gradient descent and mirror descent with constant step sizes (rather than decaying step sizes, as is usually required for no regret). Further, as we show through examples, gradient equilibrium translates into an interpretable and meaningful property in online prediction problems spanning regression, classification, quantile estimation, and others. Notably, we show that the gradient equilibrium framework can be used to develop a debiasing scheme for black-box predictions under arbitrary distribution shift, based on simple post hoc online descent updates. We also show that post hoc gradient updates can be used to calibrate predicted quantiles under distribution shift, and that the framework leads to unbiased Elo scores for pairwise preference prediction.