Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Muchen Li

On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization

May 24, 2025

Wenlong Deng, Yi Ren, Muchen Li, Danica J. Sutherland, Xiaoxiao Li, Christos Thrampoulidis

Abstract:Reinforcement learning (RL) has become popular in enhancing the reasoning capabilities of large language models (LLMs), with Group Relative Policy Optimization (GRPO) emerging as a widely used algorithm in recent systems. Despite GRPO's widespread adoption, we identify a previously unrecognized phenomenon we term Lazy Likelihood Displacement (LLD), wherein the likelihood of correct responses marginally increases or even decreases during training. This behavior mirrors a recently discovered misalignment issue in Direct Preference Optimization (DPO), attributed to the influence of negative gradients. We provide a theoretical analysis of GRPO's learning dynamic, identifying the source of LLD as the naive penalization of all tokens in incorrect responses with the same strength. To address this, we develop a method called NTHR, which downweights penalties on tokens contributing to the LLD. Unlike prior DPO-based approaches, NTHR takes advantage of GRPO's group-based structure, using correct responses as anchors to identify influential tokens. Experiments on math reasoning benchmarks demonstrate that NTHR effectively mitigates LLD, yielding consistent performance gains across models ranging from 0.5B to 3B parameters.

Via

Access Paper or Ask Questions

Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation

Jan 24, 2025

Sadegh Mahdavi, Muchen Li, Kaiwen Liu, Christos Thrampoulidis, Leonid Sigal, Renjie Liao

Abstract:Advances in Large Language Models (LLMs) have sparked interest in their ability to solve Olympiad-level math problems. However, the training and evaluation of these models are constrained by the limited size and quality of available datasets, as creating large-scale data for such advanced problems requires extensive effort from human experts. In addition, current benchmarks are prone to contamination, leading to unreliable evaluations. In this paper, we present an automated pipeline that leverages the rich resources of the Art of Problem Solving (AoPS) forum, which predominantly features Olympiad-level problems and community-driven solutions. Using open-source LLMs, we develop a method to extract question-answer pairs from the forum, resulting in AoPS-Instruct, a dataset of more than 600,000 high-quality QA pairs. Our experiments demonstrate that fine-tuning LLMs on AoPS-Instruct improves their reasoning abilities across various benchmarks. Moreover, we build an automatic pipeline that introduces LiveAoPSBench, an evolving evaluation set with timestamps, derived from the latest forum data, providing a contamination-resistant benchmark for assessing LLM performance. Notably, we observe a significant decline in LLM performance over time, suggesting their success on older examples may stem from pre-training exposure rather than true reasoning ability. Our work presents a scalable approach to creating and maintaining large-scale, high-quality datasets for advanced math reasoning, offering valuable insights into the capabilities and limitations of LLMs in this domain. Our benchmark and code is available at https://github.com/DSL-Lab/aops

Via

Access Paper or Ask Questions

GraphPNAS: Learning Distribution of Good Neural Architectures via Deep Graph Generative Models

Nov 28, 2022

Muchen Li, Jeffrey Yunfan Liu, Leonid Sigal, Renjie Liao

Figure 1 for GraphPNAS: Learning Distribution of Good Neural Architectures via Deep Graph Generative Models

Figure 2 for GraphPNAS: Learning Distribution of Good Neural Architectures via Deep Graph Generative Models

Figure 3 for GraphPNAS: Learning Distribution of Good Neural Architectures via Deep Graph Generative Models

Figure 4 for GraphPNAS: Learning Distribution of Good Neural Architectures via Deep Graph Generative Models

Abstract:Neural architectures can be naturally viewed as computational graphs. Motivated by this perspective, we, in this paper, study neural architecture search (NAS) through the lens of learning random graph models. In contrast to existing NAS methods which largely focus on searching for a single best architecture, i.e, point estimation, we propose GraphPNAS a deep graph generative model that learns a distribution of well-performing architectures. Relying on graph neural networks (GNNs), our GraphPNAS can better capture topologies of good neural architectures and relations between operators therein. Moreover, our graph generator leads to a learnable probabilistic search method that is more flexible and efficient than the commonly used RNN generator and random search methods. Finally, we learn our generator via an efficient reinforcement learning formulation for NAS. To assess the effectiveness of our GraphPNAS, we conduct extensive experiments on three search spaces, including the challenging RandWire on TinyImageNet, ENAS on CIFAR10, and NAS-Bench-101/201. The complexity of RandWire is significantly larger than other search spaces in the literature. We show that our proposed graph generator consistently outperforms RNN-based one and achieves better or comparable performances than state-of-the-art NAS methods.

Via

Access Paper or Ask Questions

Referring Transformer: A One-step Approach to Multi-task Visual Grounding

Jun 06, 2021

Muchen Li, Leonid Sigal

Figure 1 for Referring Transformer: A One-step Approach to Multi-task Visual Grounding

Figure 2 for Referring Transformer: A One-step Approach to Multi-task Visual Grounding

Figure 3 for Referring Transformer: A One-step Approach to Multi-task Visual Grounding

Figure 4 for Referring Transformer: A One-step Approach to Multi-task Visual Grounding

Abstract:As an important step towards visual reasoning, visual grounding (e.g., phrase localization, referring expression comprehension/segmentation) has been widely explored Previous approaches to referring expression comprehension (REC) or segmentation (RES) either suffer from limited performance, due to a two-stage setup, or require the designing of complex task-specific one-stage architectures. In this paper, we propose a simple one-stage multi-task framework for visual grounding tasks. Specifically, we leverage a transformer architecture, where two modalities are fused in a visual-lingual encoder. In the decoder, the model learns to generate contextualized lingual queries which are then decoded and used to directly regress the bounding box and produce a segmentation mask for the corresponding referred regions. With this simple but highly contextualized model, we outperform state-of-the-arts methods by a large margin on both REC and RES tasks. We also show that a simple pre-training schedule (on an external dataset) further improves the performance. Extensive experiments and ablations illustrate that our model benefits greatly from contextualized information and multi-task training.

Via

Access Paper or Ask Questions

Learning Spatial and Spatio-Temporal Pixel Aggregations for Image and Video Denoising

Jan 26, 2021

Xiangyu Xu, Muchen Li, Wenxiu Sun, Ming-Hsuan Yang

Figure 1 for Learning Spatial and Spatio-Temporal Pixel Aggregations for Image and Video Denoising

Figure 2 for Learning Spatial and Spatio-Temporal Pixel Aggregations for Image and Video Denoising

Figure 3 for Learning Spatial and Spatio-Temporal Pixel Aggregations for Image and Video Denoising

Figure 4 for Learning Spatial and Spatio-Temporal Pixel Aggregations for Image and Video Denoising

Abstract:Existing denoising methods typically restore clear results by aggregating pixels from the noisy input. Instead of relying on hand-crafted aggregation schemes, we propose to explicitly learn this process with deep neural networks. We present a spatial pixel aggregation network and learn the pixel sampling and averaging strategies for image denoising. The proposed model naturally adapts to image structures and can effectively improve the denoised results. Furthermore, we develop a spatio-temporal pixel aggregation network for video denoising to efficiently sample pixels across the spatio-temporal space. Our method is able to solve the misalignment issues caused by large motion in dynamic scenes. In addition, we introduce a new regularization term for effectively training the proposed video denoising model. We present extensive analysis of the proposed method and demonstrate that our model performs favorably against the state-of-the-art image and video denoising approaches on both synthetic and real-world data.

* IEEE Transactions on Image Processing 29 (2020): 7153-7165
* Project page: https://sites.google.com/view/xiangyuxu/denoise_stpan. arXiv admin note: substantial text overlap with arXiv:1904.06903

Via

Access Paper or Ask Questions

TDAF: Top-Down Attention Framework for Vision Tasks

Dec 14, 2020

Bo Pang, Yizhuo Li, Jiefeng Li, Muchen Li, Hanwen Cao, Cewu Lu

Figure 1 for TDAF: Top-Down Attention Framework for Vision Tasks

Figure 2 for TDAF: Top-Down Attention Framework for Vision Tasks

Figure 3 for TDAF: Top-Down Attention Framework for Vision Tasks

Figure 4 for TDAF: Top-Down Attention Framework for Vision Tasks

Abstract:Human attention mechanisms often work in a top-down manner, yet it is not well explored in vision research. Here, we propose the Top-Down Attention Framework (TDAF) to capture top-down attentions, which can be easily adopted in most existing models. The designed Recursive Dual-Directional Nested Structure in it forms two sets of orthogonal paths, recursive and structural ones, where bottom-up spatial features and top-down attention features are extracted respectively. Such spatial and attention features are nested deeply, therefore, the proposed framework works in a mixed top-down and bottom-up manner. Empirical evidence shows that our TDAF can capture effective stratified attention information and boost performance. ResNet with TDAF achieves 2.0% improvements on ImageNet. For object detection, the performance is improved by 2.7% AP over FCOS. For pose estimation, TDAF improves the baseline by 1.6%. And for action recognition, the 3D-ResNet adopting TDAF achieves improvements of 1.7% accuracy.

* Conference paper in AAAI 2021

Via

Access Paper or Ask Questions

TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training Model

Jun 10, 2020

Bo Pang, Yizhuo Li, Yifan Zhang, Muchen Li, Cewu Lu

Figure 1 for TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training Model

Figure 2 for TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training Model

Figure 3 for TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training Model

Figure 4 for TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training Model

Abstract:Multi-object tracking is a fundamental vision problem that has been studied for a long time. As deep learning brings excellent performances to object detection algorithms, Tracking by Detection (TBD) has become the mainstream tracking framework. Despite the success of TBD, this two-step method is too complicated to train in an end-to-end manner and induces many challenges as well, such as insufficient exploration of video spatial-temporal information, vulnerability when facing object occlusion, and excessive reliance on detection results. To address these challenges, we propose a concise end-to-end model TubeTK which only needs one step training by introducing the ``bounding-tube" to indicate temporal-spatial locations of objects in a short video clip. TubeTK provides a novel direction of multi-object tracking, and we demonstrate its potential to solve the above challenges without bells and whistles. We analyze the performance of TubeTK on several MOT benchmarks and provide empirical evidence to show that TubeTK has the ability to overcome occlusions to some extent without any ancillary technologies like Re-ID. Compared with other methods that adopt private detection results, our one-stage end-to-end model achieves state-of-the-art performances even if it adopts no ready-made detection results. We hope that the proposed TubeTK model can serve as a simple but strong alternative for video-based MOT task. The code and models are available at https://github.com/BoPang1996/TubeTK.

* CVPR-2020 oral paper

Via

Access Paper or Ask Questions

NTIRE 2020 Challenge on Video Quality Mapping: Methods and Results

May 06, 2020

Dario Fuoli, Zhiwu Huang, Martin Danelljan, Radu Timofte, Hua Wang, Longcun Jin, Dewei Su, Jing Liu, Jaehoon Lee, Michal Kudelski(+11 more)

Figure 1 for NTIRE 2020 Challenge on Video Quality Mapping: Methods and Results

Figure 2 for NTIRE 2020 Challenge on Video Quality Mapping: Methods and Results

Figure 3 for NTIRE 2020 Challenge on Video Quality Mapping: Methods and Results

Figure 4 for NTIRE 2020 Challenge on Video Quality Mapping: Methods and Results

Abstract:This paper reviews the NTIRE 2020 challenge on video quality mapping (VQM), which addresses the issues of quality mapping from source video domain to target video domain. The challenge includes both a supervised track (track 1) and a weakly-supervised track (track 2) for two benchmark datasets. In particular, track 1 offers a new Internet video benchmark, requiring algorithms to learn the map from more compressed videos to less compressed videos in a supervised training manner. In track 2, algorithms are required to learn the quality mapping from one device to another when their quality varies substantially and weakly-aligned video pairs are available. For track 1, in total 7 teams competed in the final test phase, demonstrating novel and effective solutions to the problem. For track 2, some existing methods are evaluated, showing promising solutions to the weakly-supervised video quality mapping problem.

* The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops

Via

Access Paper or Ask Questions

Learning Deformable Kernels for Image and Video Denoising

Apr 15, 2019

Xiangyu Xu, Muchen Li, Wenxiu Sun

Figure 1 for Learning Deformable Kernels for Image and Video Denoising

Figure 2 for Learning Deformable Kernels for Image and Video Denoising

Figure 3 for Learning Deformable Kernels for Image and Video Denoising

Figure 4 for Learning Deformable Kernels for Image and Video Denoising

Abstract:Most of the classical denoising methods restore clear results by selecting and averaging pixels in the noisy input. Instead of relying on hand-crafted selecting and averaging strategies, we propose to explicitly learn this process with deep neural networks. Specifically, we propose deformable 2D kernels for image denoising where the sampling locations and kernel weights are both learned. The proposed kernel naturally adapts to image structures and could effectively reduce the oversmoothing artifacts. Furthermore, we develop 3D deformable kernels for video denoising to more efficiently sample pixels across the spatial-temporal space. Our method is able to solve the misalignment issues of large motion from dynamic scenes. For better training our video denoising model, we introduce the trilinear sampler and a new regularization term. We demonstrate that the proposed method performs favorably against the state-of-the-art image and video denoising approaches on both synthetic and real-world data.

* 10 pages

Via

Access Paper or Ask Questions