Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tiantian Zhang

Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language Models

May 24, 2025

Haoyuan Sun, Jiaqi Wu, Bo Xia, Yifu Luo, Yifei Zhao, Kai Qin, Xufei Lv, Tiantian Zhang, Yongzhe Chang, Xueqian Wang

Abstract:Standing in 2025, at a critical juncture in the pursuit of Artificial General Intelligence (AGI), reinforcement fine-tuning (RFT) has demonstrated significant potential in enhancing the reasoning capability of large language models (LLMs) and has led to the development of cutting-edge AI models such as OpenAI-o1 and DeepSeek-R1. Moreover, the efficient application of RFT to enhance the reasoning capability of multimodal large language models (MLLMs) has attracted widespread attention from the community. In this position paper, we argue that reinforcement fine-tuning powers the reasoning capability of multimodal large language models. To begin with, we provide a detailed introduction to the fundamental background knowledge that researchers interested in this field should be familiar with. Furthermore, we meticulously summarize the improvements of RFT in powering reasoning capability of MLLMs into five key points: diverse modalities, diverse tasks and domains, better training algorithms, abundant benchmarks and thriving engineering frameworks. Finally, we propose five promising directions for future research that the community might consider. We hope that this position paper will provide valuable insights to the community at this pivotal stage in the advancement toward AGI. Summary of works done on RFT for MLLMs is available at https://github.com/Sun-Haoyuan23/Awesome-RL-based-Reasoning-MLLMs.

Via

Access Paper or Ask Questions

Evaluation Study on SAM 2 for Class-agnostic Instance-level Segmentation

Sep 04, 2024

Tiantian Zhang, Zhangjun Zhou, Jialun Pei

Figure 1 for Evaluation Study on SAM 2 for Class-agnostic Instance-level Segmentation

Figure 2 for Evaluation Study on SAM 2 for Class-agnostic Instance-level Segmentation

Figure 3 for Evaluation Study on SAM 2 for Class-agnostic Instance-level Segmentation

Figure 4 for Evaluation Study on SAM 2 for Class-agnostic Instance-level Segmentation

Abstract:Segment Anything Model (SAM) has demonstrated powerful zero-shot segmentation performance in natural scenes. The recently released Segment Anything Model 2 (SAM2) has further heightened researchers' expectations towards image segmentation capabilities. To evaluate the performance of SAM2 on class-agnostic instance-level segmentation tasks, we adopt different prompt strategies for SAM2 to cope with instance-level tasks for three relevant scenarios: Salient Instance Segmentation (SIS), Camouflaged Instance Segmentation (CIS), and Shadow Instance Detection (SID). In addition, to further explore the effectiveness of SAM2 in segmenting granular object structures, we also conduct detailed tests on the high-resolution Dichotomous Image Segmentation (DIS) benchmark to assess the fine-grained segmentation capability. Qualitative and quantitative experimental results indicate that the performance of SAM2 varies significantly across different scenarios. Besides, SAM2 is not particularly sensitive to segmenting high-resolution fine details. We hope this technique report can drive the emergence of SAM2-based adapters, aiming to enhance the performance ceiling of large vision models on class-agnostic instance segmentation tasks.

Via

Access Paper or Ask Questions

Incorporating Clinical Guidelines through Adapting Multi-modal Large Language Model for Prostate Cancer PI-RADS Scoring

May 14, 2024

Tiantian Zhang, Manxi Lin, Hongda Guo, Xiaofan Zhang, Ka Fung Peter Chiu, Aasa Feragen, Qi Dou

Abstract:The Prostate Imaging Reporting and Data System (PI-RADS) is pivotal in the diagnosis of clinically significant prostate cancer through MRI imaging. Current deep learning-based PI-RADS scoring methods often lack the incorporation of essential PI-RADS clinical guidelines~(PICG) utilized by radiologists, potentially compromising scoring accuracy. This paper introduces a novel approach that adapts a multi-modal large language model (MLLM) to incorporate PICG into PI-RADS scoring without additional annotations and network parameters. We present a two-stage fine-tuning process aimed at adapting MLLMs originally trained on natural images to the MRI data domain while effectively integrating the PICG. In the first stage, we develop a domain adapter layer specifically tailored for processing 3D MRI image inputs and design the MLLM instructions to differentiate MRI modalities effectively. In the second stage, we translate PICG into guiding instructions for the model to generate PICG-guided image features. Through feature distillation, we align scoring network features with the PICG-guided image feature, enabling the scoring network to effectively incorporate the PICG information. We develop our model on a public dataset and evaluate it in a real-world challenging in-house dataset. Experimental results demonstrate that our approach improves the performance of current scoring networks.

Via

Access Paper or Ask Questions

Adaptive Intra-Class Variation Contrastive Learning for Unsupervised Person Re-Identification

Apr 06, 2024

Lingzhi Liu, Haiyang Zhang, Chengwei Tang, Tiantian Zhang

Abstract:The memory dictionary-based contrastive learning method has achieved remarkable results in the field of unsupervised person Re-ID. However, The method of updating memory based on all samples does not fully utilize the hardest sample to improve the generalization ability of the model, and the method based on hardest sample mining will inevitably introduce false-positive samples that are incorrectly clustered in the early stages of the model. Clustering-based methods usually discard a significant number of outliers, leading to the loss of valuable information. In order to address the issues mentioned before, we propose an adaptive intra-class variation contrastive learning algorithm for unsupervised Re-ID, called AdaInCV. And the algorithm quantitatively evaluates the learning ability of the model for each class by considering the intra-class variations after clustering, which helps in selecting appropriate samples during the training process of the model. To be more specific, two new strategies are proposed: Adaptive Sample Mining (AdaSaM) and Adaptive Outlier Filter (AdaOF). The first one gradually creates more reliable clusters to dynamically refine the memory, while the second can identify and filter out valuable outliers as negative samples.

Via

Access Paper or Ask Questions

Replay-enhanced Continual Reinforcement Learning

Nov 20, 2023

Tiantian Zhang, Kevin Zehua Shen, Zichuan Lin, Bo Yuan, Xueqian Wang, Xiu Li, Deheng Ye

Figure 1 for Replay-enhanced Continual Reinforcement Learning

Figure 2 for Replay-enhanced Continual Reinforcement Learning

Figure 3 for Replay-enhanced Continual Reinforcement Learning

Figure 4 for Replay-enhanced Continual Reinforcement Learning

Abstract:Replaying past experiences has proven to be a highly effective approach for averting catastrophic forgetting in supervised continual learning. However, some crucial factors are still largely ignored, making it vulnerable to serious failure, when used as a solution to forgetting in continual reinforcement learning, even in the context of perfect memory where all data of previous tasks are accessible in the current task. On the one hand, since most reinforcement learning algorithms are not invariant to the reward scale, the previously well-learned tasks (with high rewards) may appear to be more salient to the current learning process than the current task (with small initial rewards). This causes the agent to concentrate on those salient tasks at the expense of generality on the current task. On the other hand, offline learning on replayed tasks while learning a new task may induce a distributional shift between the dataset and the learned policy on old tasks, resulting in forgetting. In this paper, we introduce RECALL, a replay-enhanced method that greatly improves the plasticity of existing replay-based methods on new tasks while effectively avoiding the recurrence of catastrophic forgetting in continual reinforcement learning. RECALL leverages adaptive normalization on approximate targets and policy distillation on old tasks to enhance generality and stability, respectively. Extensive experiments on the Continual World benchmark show that RECALL performs significantly better than purely perfect memory replay, and achieves comparable or better overall performance against state-of-the-art continual learning methods.

* Accepted by Transactions on Machine Learning Research 2023

Via

Access Paper or Ask Questions

Implementation and Evaluation of Physical Layer Key Generation on SDR based LoRa Platform

Aug 30, 2023

Yingying Hu, Dongyang Xu, Tiantian Zhang

Abstract:Physical layer key generation technology which leverages channel randomness to generate secret keys has attracted extensive attentions in long range (LoRa)-based networks recently. We in this paper develop a software-defined radio (SDR) based LoRa communications platform using GNU Radio on universal software radio peripheral (USRP) to implement and evaluate typical physical layer key generation schemes. Thanks to the flexibility and configurability of GNU Radio to extract LoRa packets, we are able to obtain the fine-grained channel frequency response (CFR) through LoRa preamble based channel estimation for key generation. Besides, we propose a lowcomplexity preprocessing method to enhance the randomness of quantization while reducing the secret key disagreement ratio. The results indicate that we can achieve 367 key bits with a high level of randomness through just a single effective channel probing in an indoor environment at a distance of 2 meters under the circumstance of a spreading factor (SF) of 7, a preamble length of 8, a signal bandwidth of 250 kHz, and a sampling rate of 1 MHz.

* Submitted to IEEE VTC2023 Fall

Via

Access Paper or Ask Questions

A Surrogate-Assisted Controller for Expensive Evolutionary Reinforcement Learning

Jan 01, 2022

Yuxing Wang, Tiantian Zhang, Yongzhe Chang, Bin Liang, Xueqian Wang, Bo Yuan

Figure 1 for A Surrogate-Assisted Controller for Expensive Evolutionary Reinforcement Learning

Figure 2 for A Surrogate-Assisted Controller for Expensive Evolutionary Reinforcement Learning

Figure 3 for A Surrogate-Assisted Controller for Expensive Evolutionary Reinforcement Learning

Figure 4 for A Surrogate-Assisted Controller for Expensive Evolutionary Reinforcement Learning

Abstract:The integration of Reinforcement Learning (RL) and Evolutionary Algorithms (EAs) aims at simultaneously exploiting the sample efficiency as well as the diversity and robustness of the two paradigms. Recently, hybrid learning frameworks based on this principle have achieved great success in various challenging robot control tasks. However, in these methods, policies from the genetic population are evaluated via interactions with the real environments, limiting their applicability in computationally expensive problems. In this work, we propose Surrogate-assisted Controller (SC), a novel and efficient module that can be integrated into existing frameworks to alleviate the computational burden of EAs by partially replacing the expensive policy evaluation. The key challenge in applying this module is to prevent the optimization process from being misled by the possible false minima introduced by the surrogate. To address this issue, we present two strategies for SC to control the workflow of hybrid frameworks. Experiments on six continuous control tasks from the OpenAI Gym platform show that SC can not only significantly reduce the cost of fitness evaluations, but also boost the performance of the original hybrid frameworks with collaborative learning and evolutionary processes.

Via

Access Paper or Ask Questions

Catastrophic Interference in Reinforcement Learning: A Solution Based on Context Division and Knowledge Distillation

Sep 01, 2021

Tiantian Zhang, Xueqian Wang, Bin Liang, Bo Yuan

Figure 1 for Catastrophic Interference in Reinforcement Learning: A Solution Based on Context Division and Knowledge Distillation

Figure 2 for Catastrophic Interference in Reinforcement Learning: A Solution Based on Context Division and Knowledge Distillation

Figure 3 for Catastrophic Interference in Reinforcement Learning: A Solution Based on Context Division and Knowledge Distillation

Figure 4 for Catastrophic Interference in Reinforcement Learning: A Solution Based on Context Division and Knowledge Distillation

Abstract:The powerful learning ability of deep neural networks enables reinforcement learning (RL) agents to learn competent control policies directly from high-dimensional and continuous environments. In theory, to achieve stable performance, neural networks assume i.i.d. inputs, which unfortunately does no hold in the general RL paradigm where the training data is temporally correlated and non-stationary. This issue may lead to the phenomenon of "catastrophic interference" and the collapse in performance as later training is likely to overwrite and interfer with previously learned policies. In this paper, we introduce the concept of "context" into single-task RL and develop a novel scheme, termed as Context Division and Knowledge Distillation (CDaKD) driven RL, to divide all states experienced during training into a series of contexts. Its motivation is to mitigate the challenge of aforementioned catastrophic interference in deep RL, thereby improving the stability and plasticity of RL models. At the heart of CDaKD is a value function, parameterized by a neural network feature extractor shared across all contexts, and a set of output heads, each specializing on an individual context. In CDaKD, we exploit online clustering to achieve context division, and interference is further alleviated by a knowledge distillation regularization term on the output layers for learned contexts. In addition, to effectively obtain the context division in high-dimensional state spaces (e.g., image inputs), we perform clustering in the lower-dimensional representation space of a randomly initialized convolutional encoder, which is fixed throughout training. Our results show that, with various replay memory capacities, CDaKD can consistently improve the performance of existing RL algorithms on classic OpenAI Gym tasks and the more complex high-dimensional Atari tasks, incurring only moderate computational overhead.

* 17 pages

Via

Access Paper or Ask Questions

A Critical Note on the Evaluation of Clustering Algorithms

Aug 10, 2019

Li Zhong, Tiantian Zhang, Bo Yuan

Figure 1 for A Critical Note on the Evaluation of Clustering Algorithms

Figure 2 for A Critical Note on the Evaluation of Clustering Algorithms

Figure 3 for A Critical Note on the Evaluation of Clustering Algorithms

Figure 4 for A Critical Note on the Evaluation of Clustering Algorithms

Abstract:Experimental evaluation is a major research methodology for investigating clustering algorithms. For this purpose, a number of benchmark datasets have been widely used in the literature and their quality plays an important role on the value of the research work. However, in most of the existing studies, little attention has been paid to the specific properties of the datasets and they are often regarded as black-box problems. In our work, with the help of advanced visualization and dimension reduction techniques, we show that there are potential issues with some of the popular benchmark datasets used to evaluate clustering algorithms that may seriously compromise the research quality and even may produce completely misleading results. We suggest that significant efforts need to be devoted to improving the current practice of experimental evaluation of clustering algorithms by having a principled analysis of each benchmark dataset of interest.

* 4 pages, 9 figures

Via

Access Paper or Ask Questions