Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Haotian Zhang

SRPO: A Cross-Domain Implementation of Large-Scale Reinforcement Learning on LLM

Apr 22, 2025

Xiaojiang Zhang, Jinghui Wang, Zifei Cheng, Wenhao Zhuang, Zheng Lin, Minglei Zhang, Shaojie Wang, Yinghan Cui, Chao Wang, Junyi Peng(+7 more)

Abstract:Recent advances of reasoning models, exemplified by OpenAI's o1 and DeepSeek's R1, highlight the significant potential of Reinforcement Learning (RL) to enhance the reasoning capabilities of Large Language Models (LLMs). However, replicating these advancements across diverse domains remains challenging due to limited methodological transparency. In this work, we present two-Staged history-Resampling Policy Optimization (SRPO), which surpasses the performance of DeepSeek-R1-Zero-32B on the AIME24 and LiveCodeBench benchmarks. SRPO achieves this using the same base model as DeepSeek (i.e. Qwen2.5-32B), using only about 1/10 of the training steps required by DeepSeek-R1-Zero-32B, demonstrating superior efficiency. Building upon Group Relative Policy Optimization (GRPO), we introduce two key methodological innovations: (1) a two-stage cross-domain training paradigm designed to balance the development of mathematical reasoning and coding proficiency, and (2) History Resampling (HR), a technique to address ineffective samples. Our comprehensive experiments validate the effectiveness of our approach, offering valuable insights into scaling LLM reasoning capabilities across diverse tasks.

Via

Access Paper or Ask Questions

OASIS: Order-Augmented Strategy for Improved Code Search

Mar 11, 2025

Zuchen Gao, Zizheng Zhan, Xianming Li, Erxin Yu, Haotian Zhang, Yuqun Zhang, Jing Li

Abstract:Code embeddings capture the semantic representations of code and are crucial for various code-related large language model (LLM) applications, such as code search. Previous training primarily relies on optimizing the InfoNCE loss by comparing positive natural language (NL)-code pairs with in-batch negatives. However, due to the sparse nature of code contexts, training solely by comparing the major differences between positive and negative pairs may fail to capture deeper semantic nuances. To address this issue, we propose a novel order-augmented strategy for improved code search (OASIS). It leverages order-based similarity labels to train models to capture subtle differences in similarity among negative pairs. Extensive benchmark evaluations demonstrate that our OASIS model significantly outperforms previous state-of-the-art models focusing solely on major positive-negative differences. It underscores the value of exploiting subtle differences among negative pairs with order labels for effective code embedding training.

Via

Access Paper or Ask Questions

Grammar-Based Code Representation: Is It a Worthy Pursuit for LLMs?

Mar 07, 2025

Qingyuan Liang, Zhao Zhang, Zeyu Sun, Zheng Lin, Qi Luo, Yueyi Xiao, Yizhou Chen, Yuqun Zhang, Haotian Zhang, Lu Zhang(+2 more)

Abstract:Grammar serves as a cornerstone in programming languages and software engineering, providing frameworks to define the syntactic space and program structure. Existing research demonstrates the effectiveness of grammar-based code representations in small-scale models, showing their ability to reduce syntax errors and enhance performance. However, as language models scale to the billion level or beyond, syntax-level errors become rare, making it unclear whether grammar information still provides performance benefits. To explore this, we develop a series of billion-scale GrammarCoder models, incorporating grammar rules in the code generation process. Experiments on HumanEval (+) and MBPP (+) demonstrate a notable improvement in code generation accuracy. Further analysis shows that grammar-based representations enhance LLMs' ability to discern subtle code differences, reducing semantic errors caused by minor variations. These findings suggest that grammar-based code representations remain valuable even in billion-scale models, not only by maintaining syntax correctness but also by improving semantic differentiation.

Via

Access Paper or Ask Questions

MathMistake Checker: A Comprehensive Demonstration for Step-by-Step Math Problem Mistake Finding by Prompt-Guided LLMs

Mar 06, 2025

Tianyang Zhang, Zhuoxuan Jiang, Haotian Zhang, Lin Lin, Shaohua Zhang

Abstract:We propose a novel system, MathMistake Checker, designed to automate step-by-step mistake finding in mathematical problems with lengthy answers through a two-stage process. The system aims to simplify grading, increase efficiency, and enhance learning experiences from a pedagogical perspective. It integrates advanced technologies, including computer vision and the chain-of-thought capabilities of the latest large language models (LLMs). Our system supports open-ended grading without reference answers and promotes personalized learning by providing targeted feedback. We demonstrate its effectiveness across various types of math problems, such as calculation and word problems.

* Published in AAAI 2025

Via

Access Paper or Ask Questions

Generative Motion Infilling From Imprecisely Timed Keyframes

Mar 02, 2025

Purvi Goel, Haotian Zhang, C. Karen Liu, Kayvon Fatahalian

Abstract:Keyframes are a standard representation for kinematic motion specification. Recent learned motion-inbetweening methods use keyframes as a way to control generative motion models, and are trained to generate life-like motion that matches the exact poses and timings of input keyframes. However, the quality of generated motion may degrade if the timing of these constraints is not perfectly consistent with the desired motion. Unfortunately, correctly specifying keyframe timings is a tedious and challenging task in practice. Our goal is to create a system that synthesizes high-quality motion from keyframes, even if keyframes are imprecisely timed. We present a method that allows constraints to be retimed as part of the generation process. Specifically, we introduce a novel model architecture that explicitly outputs a time-warping function to correct mistimed keyframes, and spatial residuals that add pose details. We demonstrate how our method can automatically turn approximately timed keyframe constraints into diverse, realistic motions with plausible timing and detailed submovements.

* 10 pages, Eurographics 2025

Via

Access Paper or Ask Questions

The Gap Between Principle and Practice of Lossy Image Coding

Jan 21, 2025

Haotian Zhang, Dong Liu

Figure 1 for The Gap Between Principle and Practice of Lossy Image Coding

Figure 2 for The Gap Between Principle and Practice of Lossy Image Coding

Figure 3 for The Gap Between Principle and Practice of Lossy Image Coding

Figure 4 for The Gap Between Principle and Practice of Lossy Image Coding

Abstract:Lossy image coding is the art of computing that is principally bounded by the image's rate-distortion function. This bound, though never accurately characterized, has been approached practically via deep learning technologies in recent years. Indeed, learned image coding schemes allow direct optimization of the joint rate-distortion cost, thereby outperforming the handcrafted image coding schemes by a large margin. Still, it is observed that there is room for further improvement in the rate-distortion performance of learned image coding. In this article, we identify the gap between the ideal rate-distortion function forecasted by Shannon's information theory and the empirical rate-distortion function achieved by the state-of-the-art learned image coding schemes, revealing that the gap is incurred by five different effects: modeling effect, approximation effect, amortization effect, digitization effect, and asymptotic effect. We design simulations and experiments to quantitively evaluate the last three effects, which demonstrates the high potential of future lossy image coding technologies.

* 11 pages, 5 figures

Via

Access Paper or Ask Questions

Synesthesia of Machines (SoM)-Aided FDD Precoding with Sensing Heterogeneity: A Vertical Federated Learning Approach

Jan 19, 2025

Haotian Zhang, Shijian Gao, Weibo Wen, Xiang Cheng

Figure 1 for Synesthesia of Machines (SoM)-Aided FDD Precoding with Sensing Heterogeneity: A Vertical Federated Learning Approach

Figure 2 for Synesthesia of Machines (SoM)-Aided FDD Precoding with Sensing Heterogeneity: A Vertical Federated Learning Approach

Figure 3 for Synesthesia of Machines (SoM)-Aided FDD Precoding with Sensing Heterogeneity: A Vertical Federated Learning Approach

Figure 4 for Synesthesia of Machines (SoM)-Aided FDD Precoding with Sensing Heterogeneity: A Vertical Federated Learning Approach

Abstract:High complexity in precoding design for frequency division duplex systems necessitates streamlined solutions. Guided by Synesthesia of Machines (SoM), this paper introduces a heterogeneous multi-vehicle, multi-modal sensing aided precoding scheme within a vertical federated learning (VFL) framework, which significantly minimizes pilot sequence length while optimizing the system's sum rate. We address the challenges posed by local data heterogeneity due to varying on-board sensor configurations through a meticulously designed VFL training procedure. To extract valuable channel features from multi-modal sensing, we employ three distinct data preprocessing methods that convert raw data into informative representations relevant for precoding. Additionally, we propose an online training strategy based on VFL framework, enabling the scheme to adapt dynamically to fluctuations in user numbers. Numerical results indicate that our approach, utilizing short pilot sequences, closely approximates the performance of traditional optimization methods with perfect channel state information.

Via

Access Paper or Ask Questions

Sparse Point Clouds Assisted Learned Image Compression

Dec 20, 2024

Yiheng Jiang, Haotian Zhang, Li Li, Dong Liu, Zhu Li

Figure 1 for Sparse Point Clouds Assisted Learned Image Compression

Figure 2 for Sparse Point Clouds Assisted Learned Image Compression

Figure 3 for Sparse Point Clouds Assisted Learned Image Compression

Figure 4 for Sparse Point Clouds Assisted Learned Image Compression

Abstract:In the field of autonomous driving, a variety of sensor data types exist, each representing different modalities of the same scene. Therefore, it is feasible to utilize data from other sensors to facilitate image compression. However, few techniques have explored the potential benefits of utilizing inter-modality correlations to enhance the image compression performance. In this paper, motivated by the recent success of learned image compression, we propose a new framework that uses sparse point clouds to assist in learned image compression in the autonomous driving scenario. We first project the 3D sparse point cloud onto a 2D plane, resulting in a sparse depth map. Utilizing this depth map, we proceed to predict camera images. Subsequently, we use these predicted images to extract multi-scale structural features. These features are then incorporated into learned image compression pipeline as additional information to improve the compression performance. Our proposed framework is compatible with various mainstream learned image compression models, and we validate our approach using different existing image compression methods. The experimental results show that incorporating point cloud assistance into the compression pipeline consistently enhances the performance.

* Accepted by TCSVT

Via

Access Paper or Ask Questions

Efficient Semantic Splatting for Remote Sensing Multi-view Segmentation

Dec 08, 2024

Zipeng Qi, Hao Chen, Haotian Zhang, Zhengxia Zou, Zhenwei Shi

Figure 1 for Efficient Semantic Splatting for Remote Sensing Multi-view Segmentation

Figure 2 for Efficient Semantic Splatting for Remote Sensing Multi-view Segmentation

Figure 3 for Efficient Semantic Splatting for Remote Sensing Multi-view Segmentation

Figure 4 for Efficient Semantic Splatting for Remote Sensing Multi-view Segmentation

Abstract:In this paper, we propose a novel semantic splatting approach based on Gaussian Splatting to achieve efficient and low-latency. Our method projects the RGB attributes and semantic features of point clouds onto the image plane, simultaneously rendering RGB images and semantic segmentation results. Leveraging the explicit structure of point clouds and a one-time rendering strategy, our approach significantly enhances efficiency during optimization and rendering. Additionally, we employ SAM2 to generate pseudo-labels for boundary regions, which often lack sufficient supervision, and introduce two-level aggregation losses at the 2D feature map and 3D spatial levels to improve the view-consistent and spatial continuity.

Via

Access Paper or Ask Questions

Generalized Gaussian Model for Learned Image Compression

Nov 28, 2024

Haotian Zhang, Li Li, Dong Liu

Figure 1 for Generalized Gaussian Model for Learned Image Compression

Figure 2 for Generalized Gaussian Model for Learned Image Compression

Figure 3 for Generalized Gaussian Model for Learned Image Compression

Figure 4 for Generalized Gaussian Model for Learned Image Compression

Abstract:In learned image compression, probabilistic models play an essential role in characterizing the distribution of latent variables. The Gaussian model with mean and scale parameters has been widely used for its simplicity and effectiveness. Probabilistic models with more parameters, such as the Gaussian mixture models, can fit the distribution of latent variables more precisely, but the corresponding complexity will also be higher. To balance between compression performance and complexity, we extend the Gaussian model to the generalized Gaussian model for more flexible latent distribution modeling, introducing only one additional shape parameter, beta, than the Gaussian model. To enhance the performance of the generalized Gaussian model by alleviating the train-test mismatch, we propose improved training methods, including beta-dependent lower bounds for scale parameters and gradient rectification. Our proposed generalized Gaussian model, coupled with the improved training methods, is demonstrated to outperform the Gaussian and Gaussian mixture models on a variety of learned image compression methods.

* 13 pages, 12 figures

Via

Access Paper or Ask Questions