UPenn Optimization Seminar

We consider the notions of (i) critical points, (ii) second-order points, (iii) local minima, and (iv) strict local minima for multivariate polynomials. For each type of point, and as a function of the degree of the polynomial, we study the complexity of deciding (1) if a given point is of that type, and (2) if a polynomial has a point of that type. Our results characterize the complexity of these two questions for all degrees left open by prior literature. Our main contributions reveal that many of these questions turn out to be tractable for cubic polynomials. By contrast, we show that unless P=NP, there cannot be a polynomial-time algorithm that finds a point within Euclidean distance $c^n$ (for any constant $c\geq 0$) of a local minimizer of an $n$-variate quadratic polynomial over a polytope. This result (with $c=0$) answers a question of Pardalos and Vavasis that appeared on a list of seven open problems in complexity theory for numerical optimization in 1992. Based on joint work with Jeffrey Zhang (Yale).

Date	Speaker	Affiliation	Title
Aug 31	Amir Ali Ahmadi	Princeton ORFE	Complexity of Finding Local Minima in Polynomial Optimization [slides]
Sep 7	Rakesh Vohra	UPenn Econ & ESE	(Near) Substitute Preferences and Equilibria with Indivisibilities [slides]
Sep 14	Sinho Chewi	IAS & Yale Stats	Proximal Algorithms for Sampling and Variational Inference [slides]
Sep 21	Sanjeev Khanna	UPenn CIS	Sublinear Algorithms for Hierarchical Clustering [slides]
Sep 28	Courtney Paquette	McGill Math & Stats	Hitting the High-D Notes: An ODE for SGD learning dynamics in high-dimensions [slides]
Oct 5	Yuting Wei	UPenn Stats	Approximate message passing: A non-asymptotic framework and beyond [slides]
Oct 12 [Fall Break]	------------	------------
~~Oct 19~~ [Postponed to Spring]	~~Fatma Kılınç-Karzan~~	~~CMU OR~~	~~Using exactness guarantees to design faster algorithms for a class of semidefinite programs~~
Oct 26	Victor Preciado	UPenn ESE	Optimal Resource Allocation to Control Epidemic Outbreaks in Networked Populations [slides]
~~Nov 2~~ [Postponed to Spring]	~~Dimitris Bertsimas~~	~~MIT Sloan~~
Nov 2	Arthur Jacot	NYU Courant	Bottleneck Structure in Deep Neural Networks: Mechanisms of Symmetry Learning [slides]
Nov 9	Nikolai Matni	UPenn ESE	Representation Learning for Dynamics and Control [slides]
Nov 16	Damek Davis	Cornell ORIE	A local exponential acceleration of gradient methods for "generic" nonsmooth optimization problems
Nov 21 [Tuesday due to Thanksgiving]	Stefanie Jegelka	MIT EECS	Some benefits of machine learning with invariances [slides]
Nov 30	Hamsa Bastani	Wharton OID	Rethinking Fairness for Human-AI Collaboration [slides]
Dec 7	Guanghui (George) Lan	Georgia Tech ISyE	Uniform Optimality for Convex and Nonconvex Optimization [slides]

Rakesh Vohra: (Near) Substitute Preferences and Equilibria with Indivisibilities

Abstract: An obstacle to using market mechanisms to allocate indivisible goods (such as courses to students) is the non-existence of competitive equilibria (CE). To surmount this, Arrow and Hahn proposed the notion of social-approximate equilibria: a price vector and corresponding excess demands that are `small'. We identify a class of preferences called $\Delta$-substitutes, and show that social approximate equilibria where the bound on excess demand, good-by-good, is $2(\Delta-1)$ independent of the size of the economy. When $\Delta=1$ existence of CE is guaranteed even in the presence of income effects. This sufficient condition strictly generalizes prior conditions.

These results rely on a new type of Shapley-Folkman-Starr Lemma that should be of independent interest. That lemma states that the Minkowski sum of a large number of sets is approximately convex. The `game’ of course is `how large’ and `how approximate’.

This is joint work with Thanh Nguyen.

Sinho Chewi: Proximal algorithms for sampling and variational inference

I will discuss two recent algorithms, inspired by proximal methods in optimization, over the space of probability measures: the proximal sampler, and forward-backward JKO for Gaussians. The goal is to explore how principles from optimization enables the design of new algorithms for probabilistic problems.

Sanjeev Khanna: Sublinear Algorithms for Hierarchical Clustering

Hierarchical clustering is a popular technique for organizing data as a rooted tree structure that simultaneously clusters data at multiple levels of granularity. A well-studied recent objective function views the input as a weighted graph with edges indicating similarity between the data points, and focuses on finding a tree that minimizes the cost of hierarchical partitioning. The resulting optimization problem is NP-hard, and previous algorithms for approximating this objective require at least linear time/space. In this talk, we will consider algorithms for hierarchical clustering that use sublinear resources (space, time, and communication).

Specifically, we will present sublinear algorithms for hierarchical clustering in the streaming model (space), in the query model (time), and in the MPC model (communication). At the core of our algorithmic results is a connection between hierarchical clustering and a suitably relaxed notion of cut sparsifiers of graphs that lends itself to efficient sublinear algorithms. We complement our algorithmic results by establishing nearly matching lower bounds that rule out algorithms with better performance guarantees in each of these models.

This is joint work with Arpit Agarwal, Huan Li, and Prathamesh Patil.

Courtney Paquette: Hitting the High-D Notes: An ODE for SGD learning dynamics in high-dimensions

In this talk, I will present a framework, inspired by random matrix theory, for analyzing the dynamics of stochastic optimization algorithms (e.g., stochastic gradient descent (SGD) and momentum (SGD + M)) when both the number of samples and dimensions are large. Using this new framework, we show that the dynamics of optimization algorithms on generalized linear models and multi-index problems with random data become deterministic in the large sample and dimensional limit. In particular, the limiting dynamics for stochastic algorithms are governed by a ODE. From this model, we identify a stability measurement, the implicit conditioning ratio (ICR), which regulates the ability of SGD+M to accelerate the algorithm. When the batch size exceeds this ICR, SGD+M converges linearly at a rate of O(1/ κ), matching optimal full-batch momentum (in particular performing as well as a full-batch but with a fraction of the size). For batch sizes smaller than the ICR, in contrast, SGD+M has rates that scale like a multiple of the single batch SGD rate. We give explicit choices for the learning rate and momentum parameter in terms of the Hessian spectra that achieve this performance. Finally we show this model matches performances on real data sets.

Yuting Wei: Approximate message passing: A non-asymptotic framework and beyond

Approximate message passing (AMP) emerges as an effective iterative algorithm for solving high-dimensional statistical problems. However, prior AMP theory, which focused mostly on high-dimensional asymptotics, fell short of predicting the AMP dynamics when the number of iterations surpasses o(log n / log log n) (with n the problem dimension). To address this inadequacy, this talk introduces a non-asymptotic framework towards understanding AMP. Built upon a new decomposition of AMP updates in conjunction with well-controlled residual terms, we lay out an analysis recipe to characterize the finite-sample convergence of AMP up to O(n / polylog(n)) iterations. We will discuss concrete consequences of the proposed analysis recipe in the Z2 synchronization problem; more specifically, we predict the behavior of randomly initialized AMP for up to O(n/poly(\log n)) iterations, showing that the algorithm succeeds without the need of a careful spectral initialization and also a subsequent refinement stage (as conjectured recently by Celentano et al.)

Fatma Kılınç-Karzan: Using exactness guarantees to design faster algorithms for a class of semidefinite programs

Semidefinite programs (SDPs) have been used as a tractable relaxation for many NP-hard problems that naturally arise in operations research, engineering, and computer science. The SDP relaxation is obtained by first reformulating the problem in a lifted space with an additional rank constraint and then dropping the rank constraint. In this talk, we will first study the SDP relaxation for general quadratically constrained quadratic programs, present various exactness concepts related to the SDP relaxation, and discuss conditions guaranteeing such SDP exactness. In particular, this will allow us to identify structural properties of these problems that admit equivalent tractable SDP reformulations. Despite the well-established strength of SDP relaxations, the task of solving an SDP is still considered impractical, especially in modern large-data settings, and precludes their widespread adoption in practice. In the second part of this talk, we will review how we can effectively exploit the exactness properties of SDPs to design storage-optimal accelerated first-order methods (which achieve even linear convergence rates for certain problems). This is joint work with Alex Wang.

Victor Preciado: Optimal Resource Allocation to Control Epidemic Outbreaks in Networked Populations

We study the problem of controlling epidemic outbreaks in networked populations by distributing protection resources throughout the nodes of the network. We assume that two types of protection resources are available: (i) Preventive resources able to defend individuals in the population against the spreading of the disease (such as vaccines or disease-awareness campaigns), and (ii) corrective resources able to neutralize the spreading (such as antidotes). We assume that both preventive and corrective resources have an associated cost and study the problem of finding the cost-optimal distribution of resources throughout the networked population. We analyze these questions in the context of a viral outbreak and study the following two problems: (i) Given a fixed budget, find the optimal allocation of preventive and corrective resources in the network to achieve the highest level of disease containment, and (ii) when a budget is not specified, find the minimum budget required to eradicate the disease. We show that both resource allocation problems can be efficiently solved for a wide class of cost functions. We illustrate our approach by designing optimal protection strategies to contain an epidemic outbreak that propagates through the world-wide air transportation network.

Arthur Jacot: Bottleneck Structure in Deep Neural Networks: Mechanisms of Symmetry Learning

Deep Neural Networks (DNNs) have proven to be able to break the curse of dimensionality, and learn complex tasks on high dimensional data, such as images or text. But we still do not fully understand what makes this possible. To answer this question, I will describe the appearance of a Bottleneck structure, where the network learns low-dimensional features in the middle of the network. This allows the network to identify and learn symmetries of the task it is trained on, without any prior knowledge. This could explain the success of deep learning on image and text tasks which feature many `hidden' symmetries.

Nikolai Matni: Representation Learning for Dynamics and Control

Representation learning, in which common features are extracted using data from heterogeneous sources or tasks, has underpinned much of the exciting recent progress in machine learning. Intuitively, using all of one's data to learn a common representation function benefits both computational effort and statistical generalization by leaving a smaller number of parameters to fine-tune on a given target task, and indeed, recent results support this intuition in the context of classification and regression over i.i.d. data. However, in order to reap the benefits of representation learning in the context of dynamics and control applications, algorithmic and analytical tools need to accommodate sequential data that is emphatically not i.i.d.. Towards that goal, we will first overview our recent progress in understanding how and when empirical risk minimization-based representation learning over data generated by a dynamical system is statistically beneficial, with a focus on applications to imitation learning. We then turn our attention to optimization challenges (and solutions!) related to learning representations over non-isotropic non-i.i.d. data, and show how simple modifications to alternating-descent-methods can significantly improve their convergence properties.

Damek Davis: A local exponential acceleration of gradient methods for "generic" nonsmooth optimization problems

Gradient methods in nonsmooth optimization are often described as "slow:" known complexity guarantees are sublinear at best. In this talk, I will present a black-box gradient method that (locally) exponentially improves complexity bounds for nonsmooth functions. The method is parameter-free, the memory and per-step cost are the same as for gradient descent, and the complexity is independent of the problem dimension. Practically, the method is implemented in PyTorch and shows promise in numerical experiments. Theoretically, the improvement holds for "generic" nonsmooth functions (convex or otherwise). The key insight is that nonsmooth functions are often "partially" smooth in useful ways. The talk will be elementary and visual.

Stefanie Jegelka: Some benefits of machine learning with invariances

In many applications, especially in the sciences, data and tasks have known invariances. Encoding such invariances directly into a machine learning model can improve learning outcomes, while it also poses challenges on efficient model design. In the first part of the talk, we will focus on the invariances relevant to eigenvectors and eigenspaces being inputs to a neural network. Such inputs are important, for instance, for graph representation learning. We will discuss targeted architectures that can universally express functions with the relevant invariances - sign flips and changes of basis - and their theoretical and empirical benefits. Then we will extend these ideas to equivariance. Second, we will take a broader, theoretical perspective. Empirically, it is known that encoding invariances into the machine learning model can reduce sample complexity. For the simplified setting of kernel ridge regression or random features, we will discuss new bounds that illustrate two ways in which invariances can reduce sample complexity. Our results hold for learning on manifolds and for invariances to (almost) any group action.

Hamsa Bastani: Rethinking Fairness for Human-AI Collaboration

Existing approaches to algorithmic fairness aim to ensure equitable outcomes if human decision-makers comply perfectly with algorithmic decisions. However, perfect compliance with the algorithm is rarely a reality or even a desirable outcome in human-AI collaboration. Yet, recent studies have shown that selective compliance with fair algorithms can amplify discrimination relative to the prior human policy. As a consequence, ensuring equitable outcomes requires fundamentally different algorithmic design principles that ensure robustness to the decision-maker's (a priori unknown) compliance pattern. We define the notion of compliance-robustly fair algorithmic recommendations that are guaranteed to (weakly) improve fairness in decisions, regardless of the human's compliance pattern. We propose a simple optimization strategy to identify the best performance-improving compliance-robustly fair policy. However, we show that it may be infeasible to design algorithmic recommendations that are simultaneously fair in isolation, compliance-robustly fair, and more accurate than the human policy; thus, if our goal is to improve the equity and accuracy of human-AI collaboration, it may not be desirable to enforce traditional fairness constraints. [Joint work with H. Ge and O. Bastani; extended abstract to appear in ITCS 2024.]

George Lan: Uniform Optimality for Convex and Nonconvex Optimization

The past few years have witnessed growing interest in the development of easily implementable parameter-free first-order methods to facilitate their applications, e.g., in data science and machine learning. In this talk, I will discuss some recent progresses that we made on uniformly optimal methods for convex and nonconvex optimization. By uniform optimality, we mean that these algorithms do not require the input of any problem parameters but can still achieve the best possible iteration complexity bounds for solving different classes of optimization problems. We first consider convex optimization problems under different smoothness levels and show that neither such smoothness information nor line search procedures are needed to achieve uniform optimality. We then consider regularity conditions (e.g., strong convexity and lower curvature) that are imposed over a global scope and thus are notoriously more difficult to estimate. By presenting novel methods that can achieve tight complexity bounds to compute solutions with verifiably small (projected) gradients, we show that such regularity information is in fact superfluous for handling strongly convex and nonconvex problems. It is worth noting that our complexity bound for nonconvex problems also appears to be new in the literature.