UPenn Optimization Seminar (new!)
Quick facts
- When: Thursdays, 12-1pm
- Where: Room 1201, Steinberg-Dietrich Hall
- Mailing list:
Subscribe to "opt-seminar" here
- Organizer: Jason Altschuler
About
- What: This seminar series features leading experts in optimization and adjacent fields. Topics range broadly from the design and analysis of optimization algorithms, to the complexity of fundamental optimization tasks, to the modeling and formulation of optimization applications arising in machine learning, engineering, operations, economics, applied mathematics, etc.
- Why: This seminar serves as a university-wide hub to bring together the many optimization communities across UPenn --- in Statistics and Data Science (the host department), Electrical Engineering, Computer Science, Applied Mathematics, Economics, Wharton OID, etc. To promote this internal interaction, we will alternate external/internal speakers in this inaugural year.
Schedule for Fall 2023
Abstracts
Amir Ali Ahmadi: Complexity of Finding Local Minima in Polynomial Optimization
We consider the notions of (i) critical points, (ii) second-order points, (iii) local minima, and (iv) strict local minima for multivariate polynomials. For each type of point, and as a function of the degree of the polynomial, we study the complexity of deciding (1) if a given point is of that type, and (2) if a polynomial has a point of that type. Our results characterize the complexity of these two questions for all degrees left open by prior literature. Our main contributions reveal that many of these questions turn out to be tractable for cubic polynomials. By contrast, we show that unless P=NP, there cannot be a polynomial-time algorithm that finds a point within Euclidean distance $c^n$ (for any constant $c\geq 0$) of a local minimizer of an $n$-variate quadratic polynomial over a polytope. This result (with $c=0$) answers a question of Pardalos and Vavasis that appeared on a list of seven open problems in complexity theory for numerical optimization in 1992. Based on joint work with Jeffrey Zhang (Yale).
Rakesh Vohra: (Near) Substitute Preferences and Equilibria with Indivisibilities
Abstract: An obstacle to using market mechanisms to allocate indivisible goods (such as courses to students) is the non-existence of competitive equilibria (CE). To surmount this, Arrow and Hahn proposed the notion of social-approximate equilibria: a price vector and corresponding excess demands that are `small'. We identify a class of preferences called $\Delta$-substitutes, and show that social approximate equilibria where the bound on excess demand, good-by-good, is $2(\Delta-1)$ independent of the size of the economy. When $\Delta=1$ existence of CE is guaranteed even in the presence of income effects. This sufficient condition strictly generalizes prior conditions.
These results rely on a new type of Shapley-Folkman-Starr Lemma that should be of independent interest. That lemma states that the Minkowski sum of a large number of sets is approximately convex. The `game’ of course is `how large’ and `how approximate’.
This is joint work with Thanh Nguyen.
Sinho Chewi: Proximal algorithms for sampling and variational inference
I will discuss two recent algorithms, inspired by proximal methods in optimization, over the space of probability measures: the proximal sampler, and forward-backward JKO for Gaussians. The goal is to explore how principles from optimization enables the design of new algorithms for probabilistic problems.
Sanjeev Khanna: Sublinear Algorithms for Hierarchical Clustering
Hierarchical clustering is a popular technique for organizing data as a rooted tree structure that simultaneously clusters data at multiple levels of granularity. A well-studied recent objective function views the input as a weighted graph with edges indicating similarity between the data points, and focuses on finding a tree that minimizes the cost of hierarchical partitioning. The resulting optimization problem is NP-hard, and previous algorithms for approximating this objective require at least linear time/space. In this talk, we will consider algorithms for hierarchical clustering that use sublinear resources (space, time, and communication).
Specifically, we will present sublinear algorithms for hierarchical clustering in the streaming model (space), in the query model (time), and in the MPC model (communication). At the core of our algorithmic results is a connection between hierarchical clustering and a suitably relaxed notion of cut sparsifiers of graphs that lends itself to efficient sublinear algorithms. We complement our algorithmic results by establishing nearly matching lower bounds that rule out algorithms with better performance guarantees in each of these models.
This is joint work with Arpit Agarwal, Huan Li, and Prathamesh Patil.
Courtney Paquette: Hitting the High-D Notes: An ODE for SGD learning dynamics in high-dimensions
In this talk, I will present a framework, inspired by random matrix theory, for analyzing the dynamics of stochastic optimization algorithms (e.g., stochastic gradient descent (SGD) and momentum (SGD + M)) when both the number of samples and dimensions are large. Using this new framework, we show that the dynamics of optimization algorithms on generalized linear models and multi-index problems with random data become deterministic in the large sample and dimensional limit. In particular, the limiting dynamics for stochastic algorithms are governed by a ODE. From this model, we identify a stability measurement, the implicit conditioning ratio (ICR), which regulates the ability of SGD+M to accelerate the algorithm. When the batch size exceeds this ICR, SGD+M converges linearly at a rate of O(1/ κ), matching optimal full-batch momentum (in particular performing as well as a full-batch but with a fraction of the size). For batch sizes smaller than the ICR, in contrast, SGD+M has rates that scale like a multiple of the single batch SGD rate. We give explicit choices for the learning rate and momentum parameter in terms of the Hessian spectra that achieve this performance. Finally we show this model matches performances on real data sets.
Yuting Wei: Approximate message passing: A non-asymptotic framework and beyond
Approximate message passing (AMP) emerges as an effective iterative algorithm for solving high-dimensional statistical problems. However, prior AMP theory, which focused mostly on high-dimensional asymptotics, fell short of predicting the AMP dynamics when the number of iterations surpasses o(log n / log log n) (with n the problem dimension). To address this inadequacy, this talk introduces a non-asymptotic framework towards understanding AMP. Built upon a new decomposition of AMP updates in conjunction with well-controlled residual terms, we lay out an analysis recipe to characterize the finite-sample convergence of AMP up to O(n / polylog(n)) iterations. We will discuss concrete consequences of the proposed analysis recipe in the Z2 synchronization problem; more specifically, we predict the behavior of randomly initialized AMP for up to O(n/poly(\log n)) iterations, showing that the algorithm succeeds without the need of a careful spectral initialization and also a subsequent refinement stage (as conjectured recently by Celentano et al.)
Fatma Kılınç-Karzan: Using exactness guarantees to design faster algorithms for a class of semidefinite programs
Semidefinite programs (SDPs) have been used as a tractable relaxation for many NP-hard problems that naturally arise in operations research, engineering, and computer science. The SDP relaxation is obtained by first reformulating the problem in a lifted space with an additional rank constraint and then dropping the rank constraint. In this talk, we will first study the SDP relaxation for general quadratically constrained quadratic programs, present various exactness concepts related to the SDP relaxation, and discuss conditions guaranteeing such SDP exactness. In particular, this will allow us to identify structural properties of these problems that admit equivalent tractable SDP reformulations. Despite the well-established strength of SDP relaxations, the task of solving an SDP is still considered impractical, especially in modern large-data settings, and precludes their widespread adoption in practice. In the second part of this talk, we will review how we can effectively exploit the exactness properties of SDPs to design storage-optimal accelerated first-order methods (which achieve even linear convergence rates for certain problems). This is joint work with Alex Wang.
Victor Preciado: Optimal Resource Allocation to Control Epidemic Outbreaks in Networked Populations
We study the problem of controlling epidemic outbreaks in networked populations by distributing protection resources throughout the nodes of the network. We assume that two types of protection resources are available: (i) Preventive resources able to defend individuals in the population against the spreading of the disease (such as vaccines or disease-awareness campaigns), and (ii) corrective resources able to neutralize the spreading (such as antidotes). We assume that both preventive and corrective resources have an associated cost and study the problem of finding the cost-optimal distribution of resources throughout the networked population. We analyze these questions in the context of a viral outbreak and study the following two problems: (i) Given a fixed budget, find the optimal allocation of preventive and corrective resources in the network to achieve the highest level of disease containment, and (ii) when a budget is not specified, find the minimum budget required to eradicate the disease. We show that both resource allocation problems can be efficiently solved for a wide class of cost functions. We illustrate our approach by designing optimal protection strategies to contain an epidemic outbreak that propagates through the world-wide air transportation network.
Arthur Jacot: Bottleneck Structure in Deep Neural Networks: Mechanisms of Symmetry Learning
Deep Neural Networks (DNNs) have proven to be able to break the curse of dimensionality, and learn complex tasks on high dimensional data, such as images or text. But we still do not fully understand what makes this possible. To answer this question, I will describe the appearance of a Bottleneck structure, where the network learns low-dimensional features in the middle of the network. This allows the network to identify and learn symmetries of the task it is trained on, without any prior knowledge. This could explain the success of deep learning on image and text tasks which feature many `hidden' symmetries.
Nikolai Matni: Representation Learning for Dynamics and Control
Representation learning, in which common features are extracted using data from heterogeneous sources or tasks, has underpinned much of the exciting recent progress in machine learning. Intuitively, using all of one's data to learn a common representation function benefits both computational effort and statistical generalization by leaving a smaller number of parameters to fine-tune on a given target task, and indeed, recent results support this intuition in the context of classification and regression over i.i.d. data. However, in order to reap the benefits of representation learning in the context of dynamics and control applications, algorithmic and analytical tools need to accommodate sequential data that is emphatically not i.i.d.. Towards that goal, we will first overview our recent progress in understanding how and when empirical risk minimization-based representation learning over data generated by a dynamical system is statistically beneficial, with a focus on applications to imitation learning. We then turn our attention to optimization challenges (and solutions!) related to learning representations over non-isotropic non-i.i.d. data, and show how simple modifications to alternating-descent-methods can significantly improve their convergence properties.
Damek Davis: A local exponential acceleration of gradient methods for "generic" nonsmooth optimization problems
Gradient methods in nonsmooth optimization are often described as "slow:" known complexity guarantees are sublinear at best. In this talk, I will present a black-box gradient method that (locally) exponentially improves complexity bounds for nonsmooth functions. The method is parameter-free, the memory and per-step cost are the same as for gradient descent, and the complexity is independent of the problem dimension. Practically, the method is implemented in PyTorch and shows promise in numerical experiments. Theoretically, the improvement holds for "generic" nonsmooth functions (convex or otherwise). The key insight is that nonsmooth functions are often "partially" smooth in useful ways. The talk will be elementary and visual.
Stefanie Jegelka: Some benefits of machine learning with invariances
In many applications, especially in the sciences, data and tasks have known invariances. Encoding such invariances directly into a machine learning model can improve learning outcomes, while it also poses challenges on efficient model design. In the first part of the talk, we will focus on the invariances relevant to eigenvectors and eigenspaces being inputs to a neural network. Such inputs are important, for instance, for graph representation learning. We will discuss targeted architectures that can universally express functions with the relevant invariances - sign flips and changes of basis - and their theoretical and empirical benefits. Then we will extend these ideas to equivariance.
Second, we will take a broader, theoretical perspective. Empirically, it is known that encoding invariances into the machine learning model can reduce sample complexity. For the simplified setting of kernel ridge regression or random features, we will discuss new bounds that illustrate two ways in which invariances can reduce sample complexity. Our results hold for learning on manifolds and for invariances to (almost) any group action.
Hamsa Bastani: Rethinking Fairness for Human-AI Collaboration
Existing approaches to algorithmic fairness aim to ensure equitable outcomes if human decision-makers comply perfectly with algorithmic decisions. However, perfect compliance with the algorithm is rarely a reality or even a desirable outcome in human-AI collaboration. Yet, recent studies have shown that selective compliance with fair algorithms can amplify discrimination relative to the prior human policy. As a consequence, ensuring equitable outcomes requires fundamentally different algorithmic design principles that ensure robustness to the decision-maker's (a priori unknown) compliance pattern. We define the notion of compliance-robustly fair algorithmic recommendations that are guaranteed to (weakly) improve fairness in decisions, regardless of the human's compliance pattern. We propose a simple optimization strategy to identify the best performance-improving compliance-robustly fair policy. However, we show that it may be infeasible to design algorithmic recommendations that are simultaneously fair in isolation, compliance-robustly fair, and more accurate than the human policy; thus, if our goal is to improve the equity and accuracy of human-AI collaboration, it may not be desirable to enforce traditional fairness constraints. [Joint work with H. Ge and O. Bastani; extended abstract to appear in ITCS 2024.]
George Lan: Uniform Optimality for Convex and Nonconvex Optimization
The past few years have witnessed growing interest in the development of easily implementable parameter-free first-order methods to facilitate their applications, e.g., in data science and machine learning. In this talk, I will discuss some recent progresses that we made on uniformly optimal methods for convex and nonconvex optimization. By uniform optimality, we mean that these algorithms do not require the input of any problem parameters but can still achieve the best possible iteration complexity bounds for solving different classes of optimization problems. We first consider convex optimization problems under different smoothness levels and show that neither such smoothness information nor line search procedures are needed to achieve uniform optimality. We then consider regularity conditions (e.g., strong convexity and lower curvature) that are imposed over a global scope and thus are notoriously more difficult to estimate. By presenting novel methods that can achieve tight complexity bounds to compute solutions with verifiably small (projected) gradients, we show that such regularity information is in fact superfluous for handling strongly convex and nonconvex problems. It is worth noting that our complexity bound for nonconvex problems also appears to be new in the literature.