UPenn Optimization Seminar (new!)

Quick facts

**When:**Thursdays, 12-1pm**Where:**Room 1201, Steinberg-Dietrich Hall**Mailing list:**Subscribe to "opt-seminar" here**Organizers:**Jason Altschuler and Hamed Hassani**Administrative coordinator:**Sonia Rodriguez (email: soniacr@seas.upenn.edu)

About

**What:**This seminar series features leading experts in optimization and adjacent fields. Topics range broadly from the design and analysis of optimization algorithms, to the complexity of fundamental optimization tasks, to the modeling and formulation of optimization applications arising in machine learning, engineering, operations, economics, applied mathematics, etc.**Why:**This seminar serves as a university-wide hub to bring together the many optimization communities across UPenn --- in the Departments of Statistics and Data Science, Electrical Engineering, Computer Science, Applied Mathematics, Economics, Wharton OID, etc. To help promote this internal interaction, several speakers will be from UPenn.**Funding:**We gratefully acknowledge funding from the Wharton Department of Statistics and Data Science, the IDEAS Institute for Data Science, and the NSF-Tripods-I program.**Archived schedules:**Fall 2023, Spring 2024

Schedule for Fall 2024

Date |
Speaker |
Affiliation |
Title |
---|---|---|---|

Sep 5 | Aaron Sidford | Stanford | Theoretical Advances in Efficiently Solving Markov Decision Processes |

Sep 12 | Madeleine Udell | Stanford | AI and the Future of Optimization Modeling |

Sep 19 | Alejandro Ribeiro | UPenn | Constrained Reinforcement Learning |

Sep 26 | Jacob Gardner | UPenn | Bayesian Optimization for Scientific Discovery |

Oct 3 [Fall Break] | ------------ | ------------ | |

Oct 10 | Jason Lee | Princeton | Learning Representations and Associations with Gradient Descent |

Oct 17 | Robert Freund | MIT | The Role of Level-Set Geometry on the Performance of PDHG for Conic Linear Optimization |

Oct 24 | John Wright | Columbia | |

Oct 31 | Xinyi Chen | Princeton | |

Nov 7 | Aryan Mokhtari | UT Austin | Online Learning Guided Quasi-Newton Methods: Improved Global Non-asymptotic Guarantees |

Nov 14 | Ashia Wilson | MIT | |

Nov 21 | Sewoong Oh | UW | |

Nov 28 [Thanksgiving] | ------------ | ------------ | |

Dec 5 | Sophie Yu | UPenn |

Abstracts

Aaron Sidford: Theoretical Advances in Efficiently Solving Markov Decision Processes

Markov Decision Processes (MDP) are a fundamental mathematical model for reasoning about uncertainty and are foundational to reinforcement learning theory. Over the past decade, there have been substantial advances in the design and analysis of algorithms for provably computing approximately optimal policies for MDPs in a variety of settings. In this talk, I will survey these advances touching upon optimization tools of potential broader utility. In particular, this talk will highlight recent joint work with Ishani Karmarkar, Jiayi Wang, and Yujia Jin on this topic (arXiv:2405.12952).

Madeleine Udell: AI and the Future of Optimization Modeling

Optimization problems are pervasive in sectors from manufacturing and distribution to healthcare. However, most such problems are still solved heuristically by hand rather than optimally by state-of-the-art solvers, as the expertise required to formulate and solve these problems limits the widespread adoption of optimization tools and techniques. As a glimpse of the future, this talk will introduce OptiMUS, a Large Language Model (LLM)-based agent designed to formulate and solve MILP problems from natural language descriptions. OptiMUS can develop mathematical models, write and debug solver code, develop tests, and check the validity of generated solutions. Experimentally, OptiMUS correctly solves more than 80% of benchmark problems, more than twice as many as a basic LLM prompting strategy. More broadly, we discuss the potential for LLMs in domains where accuracy and fidelity to real-world data is critical and strategies to augment and safeguard their performance.

Alejandro Ribeiro: Constrained Reinforcement Learning

Constrained reinforcement learning (CRL) involves multiple rewards that must individually accumulate to given thresholds. CRL arises naturally in cyberphysical systems which are most often specified by a set of requirements. We explain in this talk that CRL problems have null duality gaps even though they are not convex. These facts imply that they can be solved in the dual domain but that standard dual gradient descent algorithms may fail to find optimal policies. We circumvent this limitation with the introduction of a state augmented algorithm in which Lagrange multipliers are incorporated in the state space. We show that state augmented algorithms sample from stochastic policies that achieve target rewards. We further introduce resilient CRL as a mechanism to relax constraints when requirements are overspecified. We illustrate results and implications with safety and wireless communication applications.

Jacob Gardner: Bayesian Optimization for Scientific Discovery

Many scientific experimentation and discovery problems can be cast as black-box optimization problems, where the goal is to use a data-driven approach to jointly design a material, protein, or molecule that maximizes some desired property while knowing relatively little about the underlying objective. For example, a laboratory developing novel antibody therapeutics might wish to maximize binding affinity to some target antigen but will typically also care about yield, thermostability, humanization, and other constraints. In this talk, I will discuss some of our recent work advancing both the practical application and theory of Bayesian optimization--a widely studied surrogate model based approach that has emerged as one of the most successful frameworks for black-box optimization. Practically, our recent work has enabled optimization of objectives with constraints defined over complex, structured objects like molecules and peptides, leading to the discovery of diverse, novel antibiotic molecules that are as or more potent than clinically available ones on high priority drug resistant pathogens in mice. Theoretically, I will argue that--surprisingly--local approaches to global optimization may partially address to some degree pessimistic lower bounds for optimizing high dimensional black-box objectives.

Jason Lee: Learning Representations and Associations with Gradient Descent

Machine Learning has undergone a paradigm shift with the success of pretrained models. Pretraining models via gradient descent learns transferable representations that adapt to a wide swath of downstream tasks. However, significant prior theoretical work has demonstrated that in many regimes, overparametrized neural networks trained by gradient descent behave like kernel methods, and do not learn transferable representations. In this talk, we close this gap by demonstrating that there is a large class of functions which cannot be efficiently learned by kernel methods but can be easily learned with gradient descent on a neural network by learning representations that are relevant to the target task. We also demonstrate that these representations allow for efficient transfer learning, which is impossible in the kernel regime.
Finally, I will demonstrate how pretraining learns associations for in-context learning with transformers. This leads to a systematic and mechanistic understanding of learning causal structures including the celebrated induction head identified by Anthropic.

Robert Freund: The Role of Level-Set Geometry on the Performance of PDHG for Conic Linear Optimization

In joint work with Zikai Xiong (MIT OR Center), we consider solving huge-scale instances of (convex) conic linear optimization problems, at the scale where matrix-factorization-free methods are attractive or necessary. The restarted primal-dual hybrid gradient method (rPDHG) -- with heuristic enhancements and GPU implementation -- has been very successful in solving huge-scale linear programming (LP) problems. Here we extend the study of rPDHG to encompass Conic Linear Optimization as well. We present a new theoretical analysis of rPDHG for general (convex) conic linear optimization and for LP as a special case thereof. We show a relationship between geometric measures of the primal-dual (sub-)level sets and the convergence rate of rPDHG. We also show how central-path-based linear transformations -- including conic rescaling -- can markedly enhance the convergence rate of rPDHG. Last of all, we present computational results that demonstrate how rescalings can accelerate convergence to high-accuracy solutions, and lead to more efficient methods for huge-scale LP problems in particular. Many of our ideas are aligned with and connected to work by Yinyu Ye, and these connections will be discussed in the talk.

Aryan Mokhtari: Online Learning Guided Quasi-Newton Methods: Improved Global Non-asymptotic Guarantees

Quasi-Newton (QN) methods are popular iterative algorithms known for their superior practical performance compared to Gradient Descent (GD)-type methods. However, the existing theoretical results for this class of algorithms do not sufficiently justify their advantage over GD-type methods. In this talk, we discuss our recent efforts to address this issue. Specifically, in the strongly convex setting, we propose the first “globally” convergent QN method that achieves an explicit “non-asymptotic superlinear” rate. We show that the rate presented for our method is provably faster than GD after at most $O(d)$ iterations, where $d$ is the problem dimension. Additionally, in the convex setting, we present an accelerated variant of our proposed method that provably outperforms the accelerated gradient method and converges at a rate of $O(\min\{1/k^2, \sqrt{d \log k }/ k^{2.5}\})$, where $k$ is the number of iterations. To attain these results, we diverge from conventional approaches and construct our QN methods based on the Hybrid Proximal Extragradient (HPE) framework and its accelerated variants. Furthermore, a pivotal algorithmic concept underpinning our methodologies is an online learning framework for updating the Hessian approximation matrices. Specifically, we relate our method's convergence rate to the regret of a specific online convex optimization problem in the matrix space and choose the sequence of Hessian approximation matrices to minimize its overall regret.