UPenn Optimization Seminar


Quick facts



About




Schedule for Spring 2025

Date Speaker Affiliation Title
Jan 23 Aaron Roth UPenn Conditional Calibration for Task Specific Uncertainty Quantification
Jan 30 Mengdi Wang Princeton Controlled Generation for Large Foundation Models
Feb 6 Shirin Saeedi Bidokhti UPenn Learning-Based Data Compression: Fundamental limits and Algorithms
Feb 13 Alex Damian Princeton Foundations of Deep Learning: Optimization and Representation Learning
Feb 20 Noah Golowich MIT Theoretical foundations for multi-agent learning
Feb 27 Ayush Sekhari MIT ML for an Interactive World: From Learning to Unlearning
Mar 6 Bartolomeo Stellato Princeton Data-Driven Algorithm Design and Verification for Parametric Convex Optimization
Mar 20 Yaniv Romano Technion Statistics-Powered ML: Building Trust and Robustness in Black-Box Predictions
Mar 27 Dan Roy Toronto
Apr 3 Frank E. Curtis Lehigh
Apr 10 Ryan Tibshirani Berkeley Gradient Equilibrium in Online Learning
Apr 17 Angelia Nedich ASU
Apr 24 Jason Altschuler UPenn





Abstracts

Aaron Roth: Conditional Calibration for Task Specific Uncertainty Quantification

For many tasks, optimal solutions would be simple exercises if only we had direct access to some "true distribution on outcomes", conditional on all of our observed covariates. Unfortunately "true probabilities" are fundamentally impossible to come by, without making very strong modelling assumptions about the environment. On the other hand, there are various non-parametric methods for uncertainty quantification --- such as calibration and conformal prediction --- that can be used without making essentially any assumptions at all --- but their guarantees are marginal, and it is unclear what kinds of tasks they are good for. A recent line of work has given algorithms that can produce calibration guarantees that hold conditionally on any bounded set of conditioning events. This turns out to form a useful design framework for making predictions that can be used as if they were probabilities for many downstream tasks, so long as one selects the conditioning events appropriately with the downstream task in mind. We'll see three applications --- giving a method to make predictions that can be usefully consumed by many downstream decision makers, a method to make predictions that can be used to form prediction sets that are conditionally valid subject to any collection of conditioning events, and a method of making forecasts that can be used to interact with another forecaster and quickly reach agreement, recovering tractable versions of Aumann's agreement theorem.


Mengdi Wang: Controlled Generation for Large Foundation Models

Recent advances in large foundation models, such as large language models (LLMs) and diffusion models, have demonstrated impressive capabilities. However, to truly align these models with user feedback or maximize real-world objectives, it is crucial to exert control over the decoding processes, in order to steer the distribution of generated output. In this talk, we will explore methods and theory for controlled generation within LLMs and diffusion models. We will discuss various modalities or achieving this control, focusing on applications such as alignment of LLM, accelerated inference, transfer learning, and diffusion-based optimizer.


Shirin Saeedi Bidokhti: Learning-Based Data Compression: Fundamental limits and Algorithms

Data-driven methods have been the driving force of many scientific disciplines in the past decade, relying on huge amounts of empirical, experimental, and scientific data. Working with big data is impossible without data compression techniques that reduce the dimension and size of the data for storage and communication purposes and effectively denoise for efficient and accurate processing. In the past decade, learning-based compressors such as nonlinear transform coding (NTC) have shown great success in the task of compression by learning to map a high dimensional source onto its representative latent space of lower dimension using neural networks and compressing in that latent space. Despite this success, it is unknown how the rate-distortion performance of such compressors compare with the optimal limits of compression (known as the rate-distortion function) that information theory characterizes. It is also unknown how advances in the field of information theory translate to practice in the paradigm of deep learning.

In the first part of the talk, we develop neural estimation methods to compute the rate-distortion function of high dimensional real-world datasets. Using our estimate, and through experiments, we show that the rate-distortion achieved by NTC compressors are within several bits of the rate-distortion function for real-world datasets such as MNIST. We then ask if this gap can be closed using ideas in information theory. In particular, incorporating lattice coding in the latent domain, we propose lattice transform coding as a novel framework for neural compression. LTC provides significant improvement compared to the state of the art on synthetic and real-world sources.


Alex Damian: Foundations of Deep Learning: Optimization and Representation Learning

Deep learning's success stems from the ability of neural networks to automatically discover meaningful representations from raw data. In this talk, I will describe some recent insights into how optimization enables this learning process. First, I will show how optimization algorithms exhibit surprisingly rich dynamics when training neural networks, and how these complex dynamics are actually crucial to their success – enabling them to find solutions that generalize well, navigate challenging loss landscapes, and efficiently adapt to local curvature. I will then explore how optimization enables neural networks to adapt to low-dimensional structure in the data, how the geometry of the loss landscape shapes the difficulty of feature learning, and how these ideas extend to in-context learning in transformers.


Noah Golowich: Theoretical foundations for multi-agent learning

As learning algorithms become increasingly capable of acting autonomously, it is important to better understand the behavior that results from their interactions. For example, a pervasive challenge in multi-agent learning settings, which spans both theory and practice and dates back decades, has been the failure of convergence for iterative algorithms such as gradient descent. Accordingly, a longstanding central question with broad relevance is: how quickly can we compute solution concepts, i.e., equilibria, in multi-agent settings?

I will discuss results which address this question at a variety of levels, starting from foundational settings involving normal-form games and building up to complex problems such as multi-agent reinforcement learning which more aptly model realistic situations. First, I will present a result establishing a near-optimal convergence rate for a simple online learning algorithm in normal-form games, resolving a decade-long line of work which gave suboptimal bounds. I will then discuss a new algorithm for minimizing swap regret exponentially faster than previous approaches. Our algorithm allows us to answer several open questions, such as by establishing the first PTAS for correlated equilibria in extensive-form games.

Beyond contending with agents' differing incentives, the increasing use of machine learning algorithms presents other challenges, such as the proliferation of AI-generated content. In the latter part of the talk, I will discuss an approach to detect such content via watermarking. Our watermarking scheme is the first to embed a watermark in a language model's output in a way which only leads to negligible changes in the distribution of the output but which is robust to adversarial edits.


Ayush Sekhari: ML for an Interactive World: From Learning to Unlearning

The remarkable recent success of Machine Learning (ML) is driven by our ability to develop and deploy interactive models that can solve complicated tasks by understanding and adapting to the ever-changing state of the world. However, the development of such models demands significant data and computing resources. Moreover, as these models increasingly interact with humans, new post-deployment challenges emerge, including privacy concerns, data integrity, and the potential for model misuse. Addressing these issues necessitates innovative algorithmic solutions.

Reinforcement Learning (RL) is the preferred method for training interactive models. In the first part of my talk, I will discuss my work on Hybrid RL, which has led to the development of the first general-purpose, computationally efficient, and theoretically rigorous algorithms for RL. Our method learns effective policies by integrating the trial-and-error processes of RL with pre-collected interaction data logs, demonstrating strong performance in practical applications.

In the second half of my talk, I will discuss my work on the foundations of machine unlearning, a newly emerging field with significant practical applications. Machine unlearning involves updating trained ML models to exclude specific data samples from the trained model upon their deletion request, without retraining from scratch. I will delve into how machine unlearning presents a more viable alternative to traditional methods like differential privacy for data deletion, thus providing a more practical solution for ensuring data privacy post-deployment.


Bartolomeo Stellato: Data-Driven Algorithm Design and Verification for Parametric Convex Optimization

We present computational tools for analyzing and designing first-order methods in parametric convex optimization. These methods are popular for their low per-iteration cost and warm-starting capabilities. However, precisely quantifying the number of iterations required to compute high-quality solutions remains a key challenge, especially in real-time applications. First, we introduce a numerical framework for verifying the worst-case performance of first-order methods in parametric quadratic optimization. We formulate this as a mixed-integer linear program that maximizes the infinity norm of the fixed-point residual after a given number of iterations. Our approach captures a broad class of gradient, projection, and proximal iterations through affine or piecewise-affine constraints, with strong polyhedral formulations. To improve scalability, we incorporate bound-tightening techniques that exploit operator-theoretic bounds. Numerical results show that our method closely matches true worst-case performance, achieving significant reductions in worst-case fixed-point residuals compared to standard convergence analyses. Second, we present a data-driven approach for analyzing the performance of first-order methods using statistical learning theory. We establish generalization guarantees for classical optimizers using sample convergence bounds and for learned optimizers using the Probably Approximately Correct (PAC)-Bayes framework. We then apply this framework to learn accelerated first-order methods by directly minimizing the PAC-Bayes bound over key algorithmic parameters (e.g., gradient steps and warm-starts). Numerical experiments demonstrate that our approach provides strong generalization guarantees for both classical and learned optimizers, with statistical bounds that closely match true out-of-sample performance.


Yaniv Romano: Statistics-Powered ML: Building Trust and Robustness in Black-Box Predictions

Modern ML models produce valuable predictions across various applications, influencing people’s lives, opportunities, and scientific advancements. However, these systems can fail in unexpected ways, generating unreliable inferences and perpetuating biases present in the data. These issues are particularly troubling in high-stakes applications, where models are trained on increasingly diverse, incomplete, and noisy data and then deployed in dynamic environments—conditions that often exacerbate test-time failures. In response to these challenges, this talk explores a key question: How can fundamental statistical principles be harnessed to produce trustworthy predictive inference?

In the first part, I will present a new advancement in conformal prediction—a statistical wrapper for any black-box model that provides precise error bounds on ML predictions. I will focus on scenarios where training data is corrupted or biased, such as through missing features and labels, and introduce a framework for constructing predictive uncertainty estimates that remain valid despite distribution shifts between the available corrupted data and unknown clean data.

In the second part, I will show how sequential statistical testing can enable a novel test-time training scheme, allowing a pre-trained model to adapt online to unfamiliar environments. For instance, consider an image classification task where test images are captured under varying illumination conditions that differ from the training setup. Building on conformal betting martingales, I will first introduce a monitoring tool to detect data drifts. Using this tool, I will derive a rigorous ‘anti-drift correction’ mechanism grounded in (online) optimal transport principles. This mechanism forms the foundation of a self-training scheme that produces robust predictions invariant to dynamically changing environments.


Ryan Tibshirani: Gradient Equilibrium in Online Learning

We present a new perspective on online learning that we refer to as gradient equilibrium: a sequence of iterates achieves gradient equilibrium if the average of gradients of losses along the sequence converges to zero. In general, this condition is not implied by, nor implies, sublinear regret. It turns out that gradient equilibrium is achievable by standard online learning methods such as gradient descent and mirror descent with constant step sizes (rather than decaying step sizes, as is usually required for no regret). Further, as we show through examples, gradient equilibrium translates into an interpretable and meaningful property in online prediction problems spanning regression, classification, quantile estimation, and others. Notably, we show that the gradient equilibrium framework can be used to develop a debiasing scheme for black-box predictions under arbitrary distribution shift, based on simple post hoc online descent updates. We also show that post hoc gradient updates can be used to calibrate predicted quantiles under distribution shift, and that the framework leads to unbiased Elo scores for pairwise preference prediction.