Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhiyuan He

$ΔL$ Normalization: Rethink Loss Aggregation in RLVR

Sep 09, 2025

Zhiyuan He, Xufang Luo, Yike Zhang, Yuqing Yang, Lili Qiu

Abstract:We propose $\Delta L$ Normalization, a simple yet effective loss aggregation method tailored to the characteristic of dynamic generation lengths in Reinforcement Learning with Verifiable Rewards (RLVR). Recently, RLVR has demonstrated strong potential in improving the reasoning capabilities of large language models (LLMs), but a major challenge lies in the large variability of response lengths during training, which leads to high gradient variance and unstable optimization. Although previous methods such as GRPO, DAPO, and Dr. GRPO introduce different loss normalization terms to address this issue, they either produce biased estimates or still suffer from high gradient variance. By analyzing the effect of varying lengths on policy loss both theoretically and empirically, we reformulate the problem as finding a minimum-variance unbiased estimator. Our proposed $\Delta L$ Normalization not only provides an unbiased estimate of the true policy loss but also minimizes gradient variance in theory. Extensive experiments show that it consistently achieves superior results across different model sizes, maximum lengths, and tasks. Our code will be made public at https://github.com/zerolllin/Delta-L-Normalization.

Via

Access Paper or Ask Questions

Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs

May 19, 2025

Zhihe Yang, Xufang Luo, Zilong Wang, Dongqi Han, Zhiyuan He, Dongsheng Li, Yunjian Xu

Abstract:Reinforcement learning (RL) has become a cornerstone for enhancing the reasoning capabilities of large language models (LLMs), with recent innovations such as Group Relative Policy Optimization (GRPO) demonstrating exceptional effectiveness. In this study, we identify a critical yet underexplored issue in RL training: low-probability tokens disproportionately influence model updates due to their large gradient magnitudes. This dominance hinders the effective learning of high-probability tokens, whose gradients are essential for LLMs' performance but are substantially suppressed. To mitigate this interference, we propose two novel methods: Advantage Reweighting and Low-Probability Token Isolation (Lopti), both of which effectively attenuate gradients from low-probability tokens while emphasizing parameter updates driven by high-probability tokens. Our approaches promote balanced updates across tokens with varying probabilities, thereby enhancing the efficiency of RL training. Experimental results demonstrate that they substantially improve the performance of GRPO-trained LLMs, achieving up to a 46.2% improvement in K&K Logic Puzzle reasoning tasks. Our implementation is available at https://github.com/zhyang2226/AR-Lopti.

* 24 pages, 12 figures

Via

Access Paper or Ask Questions

RIGID: A Training-free and Model-Agnostic Framework for Robust AI-Generated Image Detection

May 30, 2024

Zhiyuan He, Pin-Yu Chen, Tsung-Yi Ho

Figure 1 for RIGID: A Training-free and Model-Agnostic Framework for Robust AI-Generated Image Detection

Figure 2 for RIGID: A Training-free and Model-Agnostic Framework for Robust AI-Generated Image Detection

Figure 3 for RIGID: A Training-free and Model-Agnostic Framework for Robust AI-Generated Image Detection

Figure 4 for RIGID: A Training-free and Model-Agnostic Framework for Robust AI-Generated Image Detection

Abstract:The rapid advances in generative AI models have empowered the creation of highly realistic images with arbitrary content, raising concerns about potential misuse and harm, such as Deepfakes. Current research focuses on training detectors using large datasets of generated images. However, these training-based solutions are often computationally expensive and show limited generalization to unseen generated images. In this paper, we propose a training-free method to distinguish between real and AI-generated images. We first observe that real images are more robust to tiny noise perturbations than AI-generated images in the representation space of vision foundation models. Based on this observation, we propose RIGID, a training-free and model-agnostic method for robust AI-generated image detection. RIGID is a simple yet effective approach that identifies whether an image is AI-generated by comparing the representation similarity between the original and the noise-perturbed counterpart. Our evaluation on a diverse set of AI-generated images and benchmarks shows that RIGID significantly outperforms existing trainingbased and training-free detectors. In particular, the average performance of RIGID exceeds the current best training-free method by more than 25%. Importantly, RIGID exhibits strong generalization across different image generation methods and robustness to image corruptions.

Via

Access Paper or Ask Questions

Position Engineering: Boosting Large Language Models through Positional Information Manipulation

Apr 17, 2024

Zhiyuan He, Huiqiang Jiang, Zilong Wang, Yuqing Yang, Luna Qiu, Lili Qiu

Figure 1 for Position Engineering: Boosting Large Language Models through Positional Information Manipulation

Figure 2 for Position Engineering: Boosting Large Language Models through Positional Information Manipulation

Figure 3 for Position Engineering: Boosting Large Language Models through Positional Information Manipulation

Figure 4 for Position Engineering: Boosting Large Language Models through Positional Information Manipulation

Abstract:The performance of large language models (LLMs) is significantly influenced by the quality of the prompts provided. In response, researchers have developed enormous prompt engineering strategies aimed at modifying the prompt text to enhance task performance. In this paper, we introduce a novel technique termed position engineering, which offers a more efficient way to guide large language models. Unlike prompt engineering, which requires substantial effort to modify the text provided to LLMs, position engineering merely involves altering the positional information in the prompt without modifying the text itself. We have evaluated position engineering in two widely-used LLM scenarios: retrieval-augmented generation (RAG) and in-context learning (ICL). Our findings show that position engineering substantially improves upon the baseline in both cases. Position engineering thus represents a promising new strategy for exploiting the capabilities of large language models.

Via

Access Paper or Ask Questions

LLM-ABR: Designing Adaptive Bitrate Algorithms via Large Language Models

Apr 02, 2024

Zhiyuan He, Aashish Gottipati, Lili Qiu, Francis Y. Yan, Xufang Luo, Kenuo Xu, Yuqing Yang

Abstract:We present LLM-ABR, the first system that utilizes the generative capabilities of large language models (LLMs) to autonomously design adaptive bitrate (ABR) algorithms tailored for diverse network characteristics. Operating within a reinforcement learning framework, LLM-ABR empowers LLMs to design key components such as states and neural network architectures. We evaluate LLM-ABR across diverse network settings, including broadband, satellite, 4G, and 5G. LLM-ABR consistently outperforms default ABR algorithms.

Via

Access Paper or Ask Questions

Transfer Learning-Enhanced Instantaneous Multi-Person Indoor Localization by CSI

Mar 02, 2024

Zhiyuan He, Ke Deng, Jiangchao Gong, Yi Zhou, Desheng Wang

Figure 1 for Transfer Learning-Enhanced Instantaneous Multi-Person Indoor Localization by CSI

Figure 2 for Transfer Learning-Enhanced Instantaneous Multi-Person Indoor Localization by CSI

Figure 3 for Transfer Learning-Enhanced Instantaneous Multi-Person Indoor Localization by CSI

Figure 4 for Transfer Learning-Enhanced Instantaneous Multi-Person Indoor Localization by CSI

Abstract:Passive indoor localization, integral to smart buildings, emergency response, and indoor navigation, has traditionally been limited by a focus on single-target localization and reliance on multi-packet CSI. We introduce a novel Multi-target loss, notably enhancing multi-person localization. Utilizing this loss function, our instantaneous CSI-ResNet achieves an impressive 99.21% accuracy at 0.6m precision with single-timestamp CSI. A preprocessing algorithm is implemented to counteract WiFi-induced variability, thereby augmenting robustness. Furthermore, we incorporate Nuclear Norm-Based Transfer Pre-Training, ensuring adaptability in diverse environments, which provides a new paradigm for indoor multi-person localization. Additionally, we have developed an extensive dataset, surpassing existing ones in scope and diversity, to underscore the efficacy of our method and facilitate future fingerprint-based localization research.

Via

Access Paper or Ask Questions

Gradient Boosting Machine: A Survey

Aug 19, 2019

Zhiyuan He, Danchen Lin, Thomas Lau, Mike Wu

Abstract:In this survey, we discuss several different types of gradient boosting algorithms and illustrate their mathematical frameworks in detail: 1. introduction of gradient boosting leads to 2. objective function optimization, 3. loss function estimations, and 4. model constructions. 5. application of boosting in ranking.

Via

Access Paper or Ask Questions

Unsupervised Discovery of Object Landmarks as Structural Representations

Apr 12, 2018

Yuting Zhang, Yijie Guo, Yixin Jin, Yijun Luo, Zhiyuan He, Honglak Lee

Figure 1 for Unsupervised Discovery of Object Landmarks as Structural Representations

Figure 2 for Unsupervised Discovery of Object Landmarks as Structural Representations

Figure 3 for Unsupervised Discovery of Object Landmarks as Structural Representations

Figure 4 for Unsupervised Discovery of Object Landmarks as Structural Representations

Abstract:Deep neural networks can model images with rich latent representations, but they cannot naturally conceptualize structures of object categories in a human-perceptible way. This paper addresses the problem of learning object structures in an image modeling process without supervision. We propose an autoencoding formulation to discover landmarks as explicit structural representations. The encoding module outputs landmark coordinates, whose validity is ensured by constraints that reflect the necessary properties for landmarks. The decoding module takes the landmarks as a part of the learnable input representations in an end-to-end differentiable framework. Our discovered landmarks are semantically meaningful and more predictive of manually annotated landmarks than those discovered by previous methods. The coordinates of our landmarks are also complementary features to pretrained deep-neural-network representations in recognizing visual attributes. In addition, the proposed method naturally creates an unsupervised, perceptible interface to manipulate object shapes and decode images with controllable structures. The project webpage is at http://ytzhang.net/projects/lmdis-rep

* CVPR 2018
* 48 pages

Via

Access Paper or Ask Questions

Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries

Apr 17, 2017

Yuting Zhang, Luyao Yuan, Yijie Guo, Zhiyuan He, I-An Huang, Honglak Lee

Figure 1 for Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries

Figure 2 for Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries

Figure 3 for Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries

Figure 4 for Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries

Abstract:Associating image regions with text queries has been recently explored as a new way to bridge visual and linguistic representations. A few pioneering approaches have been proposed based on recurrent neural language models trained generatively (e.g., generating captions), but achieving somewhat limited localization accuracy. To better address natural-language-based visual entity localization, we propose a discriminative approach. We formulate a discriminative bimodal neural network (DBNet), which can be trained by a classifier with extensive use of negative samples. Our training objective encourages better localization on single images, incorporates text phrases in a broad range, and properly pairs image regions with text phrases into positive and negative examples. Experiments on the Visual Genome dataset demonstrate the proposed DBNet significantly outperforms previous state-of-the-art methods both for localization on single images and for detection on multiple images. We we also establish an evaluation protocol for natural-language visual detection.

* IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017

Via

Access Paper or Ask Questions