Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dongdong Ge

Solver-Informed RL: Grounding Large Language Models for Authentic Optimization Modeling

May 17, 2025

Yitian Chen, Jingfan Xia, Siyu Shao, Dongdong Ge, Yinyu Ye

Abstract:Optimization modeling is fundamental to decision-making across diverse domains.Despite progress in automating optimization formulation from natural language descriptions, Large Language Models (LLMs) often struggle to generate formally correct and usable models due to hallucinations, posing a challenge for reliable automation. Inspired by the success of Reinforcement Learning (RL) in enhancing Large Reasoning Models, we present Solver-Informed Reinforcement Learning (SIRL).This novel framework leverages external optimization solvers as verifiable reward mechanisms to significantly improve the authenticity of LLMs for optimization modeling.Acting as precise verifiers, these solvers automatically assess the executable code and the instance-level mathematical model represented by the associated LP file, yielding precise and comprehensive feedback signals -- including syntax, feasibility, and solution quality that directly inform the RL process. This automated verification process, powered by classic optimization solvers, also underpins our instance-enhanced self-consistency method to synthesize high-quality training data. Extensive experiments on diverse public benchmarks demonstrate that SIRL achieves state-of-the-art performance, substantially outperforming existing methods in generating accurate and executable optimization models.

Via

Access Paper or Ask Questions

Beyond $\mathcal{O}(\sqrt{T})$ Regret: Decoupling Learning and Decision-making in Online Linear Programming

Jan 06, 2025

Wenzhi Gao, Dongdong Ge, Chenyu Xue, Chunlin Sun, Yinyu Ye

$Figure 1 for Beyond $\mathcal{O}(\sqrt{T})$ Regret: Decoupling Learning and Decision-making in Online Linear Programming$

$Figure 2 for Beyond $\mathcal{O}(\sqrt{T})$ Regret: Decoupling Learning and Decision-making in Online Linear Programming$

$Figure 3 for Beyond $\mathcal{O}(\sqrt{T})$ Regret: Decoupling Learning and Decision-making in Online Linear Programming$

$Figure 4 for Beyond $\mathcal{O}(\sqrt{T})$ Regret: Decoupling Learning and Decision-making in Online Linear Programming$

Abstract:Online linear programming plays an important role in both revenue management and resource allocation, and recent research has focused on developing efficient first-order online learning algorithms. Despite the empirical success of first-order methods, they typically achieve a regret no better than $\mathcal{O} ( \sqrt{T} )$, which is suboptimal compared to the $\mathcal{O} (\log T)$ bound guaranteed by the state-of-the-art linear programming (LP)-based online algorithms. This paper establishes a general framework that improves upon the $\mathcal{O} ( \sqrt{T} )$ result when the LP dual problem exhibits certain error bound conditions. For the first time, we show that first-order learning algorithms achieve $o( \sqrt{T} )$ regret in the continuous support setting and $\mathcal{O} (\log T)$ regret in the finite support setting beyond the non-degeneracy assumption. Our results significantly improve the state-of-the-art regret results and provide new insights for sequential decision-making.

* Extension of conference submission https://proceedings.mlr.press/v235/gao24n.html

Via

Access Paper or Ask Questions

A Robust Anchor-based Method for Multi-Camera Pedestrian Localization

Oct 25, 2024

Wanyu Zhang, Jiaqi Zhang, Dongdong Ge, Yu Lin, Huiwen Yang, Huikang Liu, Yinyu Ye

Figure 1 for A Robust Anchor-based Method for Multi-Camera Pedestrian Localization

Figure 2 for A Robust Anchor-based Method for Multi-Camera Pedestrian Localization

Figure 3 for A Robust Anchor-based Method for Multi-Camera Pedestrian Localization

Figure 4 for A Robust Anchor-based Method for Multi-Camera Pedestrian Localization

Abstract:This paper addresses the problem of vision-based pedestrian localization, which estimates a pedestrian's location using images and camera parameters. In practice, however, calibrated camera parameters often deviate from the ground truth, leading to inaccuracies in localization. To address this issue, we propose an anchor-based method that leverages fixed-position anchors to reduce the impact of camera parameter errors. We provide a theoretical analysis that demonstrates the robustness of our approach. Experiments conducted on simulated, real-world, and public datasets show that our method significantly improves localization accuracy and remains resilient to noise in camera parameters, compared to methods without anchors.

Via

Access Paper or Ask Questions

Reward Learning From Preference With Ties

Oct 05, 2024

Jinsong Liu, Dongdong Ge, Ruihao Zhu

Abstract:Reward learning plays a pivotal role in Reinforcement Learning from Human Feedback (RLHF), ensuring the alignment of language models. The Bradley-Terry (BT) model stands as the prevalent choice for capturing human preferences from datasets containing pairs of chosen and rejected responses. In preference modeling, the focus is not on absolute values but rather on the reward difference between chosen and rejected responses, referred to as preference strength. Thus, precise evaluation of preference strength holds paramount importance in preference modeling. However, an easily overlooked factor significantly affecting preference strength measurement is that human attitudes towards two responses may not solely indicate a preference for one over the other and ties are also a common occurrence. To address this, we propose the adoption of the generalized Bradley-Terry model -- the Bradley-Terry model with ties (BTT) -- to accommodate tied preferences, thus leveraging additional information. We prove that even with the access to the true distributions of prompt and response, disregarding ties can lead to a notable bias in preference strength measurement. Comprehensive experiments further validate the advantages of incorporating ties in preference modeling. Notably, fine-tuning with BTT significantly outperforms fine-tuning with BT on synthetic preference datasets with ties, labeled by state-of-the-art open-source LLMs.

Via

Access Paper or Ask Questions

Solving Integrated Process Planning and Scheduling Problem via Graph Neural Network Based Deep Reinforcement Learning

Sep 02, 2024

Hongpei Li, Han Zhang, Ziyan He, Yunkai Jia, Bo Jiang, Xiang Huang, Dongdong Ge

Abstract:The Integrated Process Planning and Scheduling (IPPS) problem combines process route planning and shop scheduling to achieve high efficiency in manufacturing and maximize resource utilization, which is crucial for modern manufacturing systems. Traditional methods using Mixed Integer Linear Programming (MILP) and heuristic algorithms can not well balance solution quality and speed when solving IPPS. In this paper, we propose a novel end-to-end Deep Reinforcement Learning (DRL) method. We model the IPPS problem as a Markov Decision Process (MDP) and employ a Heterogeneous Graph Neural Network (GNN) to capture the complex relationships among operations, machines, and jobs. To optimize the scheduling strategy, we use Proximal Policy Optimization (PPO). Experimental results show that, compared to traditional methods, our approach significantly improves solution efficiency and quality in large-scale IPPS instances, providing superior scheduling strategies for modern intelligent manufacturing systems.

* 24 pages, 13 figures

Via

Access Paper or Ask Questions

ORLM: Training Large Language Models for Optimization Modeling

May 30, 2024

Zhengyang Tang, Chenyu Huang, Xin Zheng, Shixi Hu, Zizhuo Wang, Dongdong Ge, Benyou Wang

Figure 1 for ORLM: Training Large Language Models for Optimization Modeling

Figure 2 for ORLM: Training Large Language Models for Optimization Modeling

Figure 3 for ORLM: Training Large Language Models for Optimization Modeling

Figure 4 for ORLM: Training Large Language Models for Optimization Modeling

Abstract:Large Language Models (LLMs) have emerged as powerful tools for tackling complex Operations Research (OR) problem by providing the capacity in automating optimization modeling. However, current methodologies heavily rely on prompt engineering (e.g., multi-agent cooperation) with proprietary LLMs, raising data privacy concerns that could be prohibitive in industry applications. To tackle this issue, we propose training open-source LLMs for optimization modeling. We identify four critical requirements for the training dataset of OR LLMs, design and implement OR-Instruct, a semi-automated process for creating synthetic data tailored to specific requirements. We also introduce the IndustryOR benchmark, the first industrial benchmark for testing LLMs on solving real-world OR problems. We apply the data from OR-Instruct to various open-source LLMs of 7b size (termed as ORLMs), resulting in a significantly improved capability for optimization modeling. Our best-performing ORLM achieves state-of-the-art performance on the NL4OPT, MAMO, and IndustryOR benchmarks. Our code and data are available at \url{https://github.com/Cardinal-Operations/ORLM}.

* Work in progress

Via

Access Paper or Ask Questions

Decoupling Learning and Decision-Making: Breaking the $\mathcal{O}(\sqrt{T})$ Barrier in Online Resource Allocation with First-Order Methods

Feb 11, 2024

Wenzhi Gao, Chunlin Sun, Chenyu Xue, Dongdong Ge, Yinyu Ye

Abstract:Online linear programming plays an important role in both revenue management and resource allocation, and recent research has focused on developing efficient first-order online learning algorithms. Despite the empirical success of first-order methods, they typically achieve a regret no better than $\mathcal{O}(\sqrt{T})$, which is suboptimal compared to the $\mathcal{O}(\log T)$ bound guaranteed by the state-of-the-art linear programming (LP)-based online algorithms. This paper establishes several important facts about online linear programming, which unveils the challenge for first-order-method-based online algorithms to achieve beyond $\mathcal{O}(\sqrt{T})$ regret. To address the challenge, we introduce a new algorithmic framework that decouples learning from decision-making. More importantly, for the first time, we show that first-order methods can attain regret $\mathcal{O}(T^{1/3})$ with this new framework. Lastly, we conduct numerical experiments to validate our theoretical findings.

Via

Access Paper or Ask Questions

A Homogenization Approach for Gradient-Dominated Stochastic Optimization

Aug 21, 2023

Jiyuan Tan, Chenyu Xue, Chuwen Zhang, Qi Deng, Dongdong Ge, Yinyu Ye

Abstract:Gradient dominance property is a condition weaker than strong convexity, yet it sufficiently ensures global convergence for first-order methods even in non-convex optimization. This property finds application in various machine learning domains, including matrix decomposition, linear neural networks, and policy-based reinforcement learning (RL). In this paper, we study the stochastic homogeneous second-order descent method (SHSODM) for gradient-dominated optimization with $\alpha \in [1, 2]$ based on a recently proposed homogenization approach. Theoretically, we show that SHSODM achieves a sample complexity of $O(\epsilon^{-7/(2 \alpha) +1})$ for $\alpha \in [1, 3/2)$ and $\tilde{O}(\epsilon^{-2/\alpha})$ for $\alpha \in [3/2, 2]$. We further provide a SHSODM with a variance reduction technique enjoying an improved sample complexity of $O( \epsilon ^{-( 7-3\alpha ) /( 2\alpha )})$ for $\alpha \in [1,3/2)$. Our results match the state-of-the-art sample complexity bounds for stochastic gradient-dominated optimization without \emph{cubic regularization}. Since the homogenization approach only relies on solving extremal eigenvector problems instead of Newton-type systems, our methods gain the advantage of cheaper iterations and robustness in ill-conditioned problems. Numerical experiments on several RL tasks demonstrate the efficiency of SHSODM compared to other off-the-shelf methods.

Via

Access Paper or Ask Questions

Pre-trained Mixed Integer Optimization through Multi-variable Cardinality Branching

May 21, 2023

Yanguang Chen, Wenzhi Gao, Dongdong Ge, Yinyu Ye

Abstract:We propose a new method to accelerate online Mixed Integer Optimization with Pre-trained machine learning models (PreMIO). The key component of PreMIO is a multi-variable cardinality branching procedure that splits the feasible region with data-driven hyperplanes, which can be easily integrated into any MIP solver with two lines of code. Moreover, we incorporate learning theory and concentration inequalities to develop a straightforward and interpretable hyper-parameter selection strategy for our method. We test the performance of PreMIO by applying it to state-of-the-art MIP solvers and running numerical experiments on both classical OR benchmark datasets and real-life instances. The results validate the effectiveness of our proposed method.

Via

Access Paper or Ask Questions

Stochastic Dimension-reduced Second-order Methods for Policy Optimization

Jan 28, 2023

Jinsong Liu, Chenghan Xie, Qi Deng, Dongdong Ge, Yinyu Ye

Abstract:In this paper, we propose several new stochastic second-order algorithms for policy optimization that only require gradient and Hessian-vector product in each iteration, making them computationally efficient and comparable to policy gradient methods. Specifically, we propose a dimension-reduced second-order method (DR-SOPO) which repeatedly solves a projected two-dimensional trust region subproblem. We show that DR-SOPO obtains an $\mathcal{O}(\epsilon^{-3.5})$ complexity for reaching approximate first-order stationary condition and certain subspace second-order stationary condition. In addition, we present an enhanced algorithm (DVR-SOPO) which further improves the complexity to $\mathcal{O}(\epsilon^{-3})$ based on the variance reduction technique. Preliminary experiments show that our proposed algorithms perform favorably compared with stochastic and variance-reduced policy gradient methods.

Via

Access Paper or Ask Questions