Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chengliang Liu

Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought

May 26, 2025

Chao Huang, Benfeng Wang, Jie Wen, Chengliang Liu, Wei Wang, Li Shen, Xiaochun Cao

Abstract:Recent advancements in reasoning capability of Multimodal Large Language Models (MLLMs) demonstrate its effectiveness in tackling complex visual tasks. However, existing MLLM-based Video Anomaly Detection (VAD) methods remain limited to shallow anomaly descriptions without deep reasoning. In this paper, we propose a new task named Video Anomaly Reasoning (VAR), which aims to enable deep analysis and understanding of anomalies in the video by requiring MLLMs to think explicitly before answering. To this end, we propose Vad-R1, an end-to-end MLLM-based framework for VAR. Specifically, we design a Perception-to-Cognition Chain-of-Thought (P2C-CoT) that simulates the human process of recognizing anomalies, guiding the MLLM to reason anomaly step-by-step. Based on the structured P2C-CoT, we construct Vad-Reasoning, a dedicated dataset for VAR. Furthermore, we propose an improved reinforcement learning algorithm AVA-GRPO, which explicitly incentivizes the anomaly reasoning capability of MLLMs through a self-verification mechanism with limited annotations. Experimental results demonstrate that Vad-R1 achieves superior performance, outperforming both open-source and proprietary models on VAD and VAR tasks. Codes and datasets will be released at https://github.com/wbfwonderful/Vad-R1.

* 9 pages, 4 figures

Via

Access Paper or Ask Questions

Wavelet-based Global-Local Interaction Network with Cross-Attention for Multi-View Diabetic Retinopathy Detection

Mar 25, 2025

Yongting Hu, Yuxin Lin, Chengliang Liu, Xiaoling Luo, Xiaoyan Dou, Qihao Xu, Yong Xu

Abstract:Multi-view diabetic retinopathy (DR) detection has recently emerged as a promising method to address the issue of incomplete lesions faced by single-view DR. However, it is still challenging due to the variable sizes and scattered locations of lesions. Furthermore, existing multi-view DR methods typically merge multiple views without considering the correlations and redundancies of lesion information across them. Therefore, we propose a novel method to overcome the challenges of difficult lesion information learning and inadequate multi-view fusion. Specifically, we introduce a two-branch network to obtain both local lesion features and their global dependencies. The high-frequency component of the wavelet transform is used to exploit lesion edge information, which is then enhanced by global semantic to facilitate difficult lesion learning. Additionally, we present a cross-view fusion module to improve multi-view fusion and reduce redundancy. Experimental results on large public datasets demonstrate the effectiveness of our method. The code is open sourced on https://github.com/HuYongting/WGLIN.

* Accepted by IEEE International Conference on Multimedia & Expo (ICME) 2025

Via

Access Paper or Ask Questions

Masked Two-channel Decoupling Framework for Incomplete Multi-view Weak Multi-label Learning

Apr 26, 2024

Chengliang Liu, Jie Wen, Yabo Liu, Chao Huang, Zhihao Wu, Xiaoling Luo, Yong Xu

Figure 1 for Masked Two-channel Decoupling Framework for Incomplete Multi-view Weak Multi-label Learning

Figure 2 for Masked Two-channel Decoupling Framework for Incomplete Multi-view Weak Multi-label Learning

Figure 3 for Masked Two-channel Decoupling Framework for Incomplete Multi-view Weak Multi-label Learning

Figure 4 for Masked Two-channel Decoupling Framework for Incomplete Multi-view Weak Multi-label Learning

Abstract:Multi-view learning has become a popular research topic in recent years, but research on the cross-application of classic multi-label classification and multi-view learning is still in its early stages. In this paper, we focus on the complex yet highly realistic task of incomplete multi-view weak multi-label learning and propose a masked two-channel decoupling framework based on deep neural networks to solve this problem. The core innovation of our method lies in decoupling the single-channel view-level representation, which is common in deep multi-view learning methods, into a shared representation and a view-proprietary representation. We also design a cross-channel contrastive loss to enhance the semantic property of the two channels. Additionally, we exploit supervised information to design a label-guided graph regularization loss, helping the extracted embedding features preserve the geometric structure among samples. Inspired by the success of masking mechanisms in image and text analysis, we develop a random fragment masking strategy for vector features to improve the learning ability of encoders. Finally, it is important to emphasize that our model is fully adaptable to arbitrary view and label absences while also performing well on the ideal full data. We have conducted sufficient and convincing experiments to confirm the effectiveness and advancement of our model.

* Accepted at NeurIPS 2023. Email: liucl1996@163.com

Via

Access Paper or Ask Questions

Progressive Adaptive Chance-Constrained Safeguards for Reinforcement Learning

Oct 05, 2023

Zhaorun Chen, Binhao Chen, Tairan He, Liang Gong, Chengliang Liu

Figure 1 for Progressive Adaptive Chance-Constrained Safeguards for Reinforcement Learning

Figure 2 for Progressive Adaptive Chance-Constrained Safeguards for Reinforcement Learning

Figure 3 for Progressive Adaptive Chance-Constrained Safeguards for Reinforcement Learning

Figure 4 for Progressive Adaptive Chance-Constrained Safeguards for Reinforcement Learning

Abstract:Safety assurance of Reinforcement Learning (RL) is critical for exploration in real-world scenarios. In handling the Constrained Markov Decision Process, current approaches experience intrinsic difficulties in trading-off between optimality and feasibility. Direct optimization methods cannot strictly guarantee state-wise in-training safety while projection-based methods are usually inefficient and correct actions through lengthy iterations. To address these two challenges, this paper proposes an adaptive surrogate chance constraint for the safety cost, and a hierarchical architecture that corrects actions produced by the upper policy layer via a fast Quasi-Newton method. Theoretical analysis indicates that the relaxed probabilistic constraint can sufficiently guarantee forward invariance to the safe set. We validate the proposed method on 4 simulated and real-world safety-critical robotic tasks. Results indicate that the proposed method can efficiently enforce safety (nearly zero-violation), while preserving optimality (+23.8%), robustness and generalizability to stochastic real-world settings.

* 6 pages, 7 figures

Via

Access Paper or Ask Questions

A Self-supervised Contrastive Learning Method for Grasp Outcomes Prediction

Jun 26, 2023

Chengliang Liu, Binhua Huang, Yiwen Liu, Yuanzhe Su, Ke Mai, Yupo Zhang, Zhengkun Yi, Xinyu Wu

Figure 1 for A Self-supervised Contrastive Learning Method for Grasp Outcomes Prediction

Figure 2 for A Self-supervised Contrastive Learning Method for Grasp Outcomes Prediction

Figure 3 for A Self-supervised Contrastive Learning Method for Grasp Outcomes Prediction

Figure 4 for A Self-supervised Contrastive Learning Method for Grasp Outcomes Prediction

Abstract:In this paper, we investigate the effectiveness of contrastive learning methods for predicting grasp outcomes in an unsupervised manner. By utilizing a publicly available dataset, we demonstrate that contrastive learning methods perform well on the task of grasp outcomes prediction. Specifically, the dynamic-dictionary-based method with the momentum updating technique achieves a satisfactory accuracy of 81.83% using data from one single tactile sensor, outperforming other unsupervised methods. Our results reveal the potential of contrastive learning methods for applications in the field of robot grasping and highlight the importance of accurate grasp prediction for achieving stable grasps.

* Manuscript accepted to RCAR 2023

Via

Access Paper or Ask Questions

Information Recovery-Driven Deep Incomplete Multi-view Clustering Network

Apr 02, 2023

Chengliang Liu, Jie Wen, Zhihao Wu, Xiaoling Luo, Chao Huang, Yong Xu

Figure 1 for Information Recovery-Driven Deep Incomplete Multi-view Clustering Network

Figure 2 for Information Recovery-Driven Deep Incomplete Multi-view Clustering Network

Figure 3 for Information Recovery-Driven Deep Incomplete Multi-view Clustering Network

Figure 4 for Information Recovery-Driven Deep Incomplete Multi-view Clustering Network

Abstract:Incomplete multi-view clustering is a hot and emerging topic. It is well known that unavoidable data incompleteness greatly weakens the effective information of multi-view data. To date, existing incomplete multi-view clustering methods usually bypass unavailable views according to prior missing information, which is considered as a second-best scheme based on evasion. Other methods that attempt to recover missing information are mostly applicable to specific two-view datasets. To handle these problems, in this paper, we propose an information recovery-driven deep incomplete multi-view clustering network, termed as RecFormer. Concretely, a two-stage autoencoder network with the self-attention structure is built to synchronously extract high-level semantic representations of multiple views and recover the missing data. Besides, we develop a recurrent graph reconstruction mechanism that cleverly leverages the restored views to promote the representation learning and the further data reconstruction. Visualization of recovery results are given and sufficient experimental results confirm that our RecFormer has obvious advantages over other top methods.

* Please contact me if you have any questions: liucl1996@163.com. The code will be opened upon acceptance

Via

Access Paper or Ask Questions

Learning Reliable Representations for Incomplete Multi-View Partial Multi-Label Classification

Mar 30, 2023

Chengliang Liu, Jie Wen, Yong Xu, Liqiang Nie, Min Zhang

Figure 1 for Learning Reliable Representations for Incomplete Multi-View Partial Multi-Label Classification

Figure 2 for Learning Reliable Representations for Incomplete Multi-View Partial Multi-Label Classification

Figure 3 for Learning Reliable Representations for Incomplete Multi-View Partial Multi-Label Classification

Figure 4 for Learning Reliable Representations for Incomplete Multi-View Partial Multi-Label Classification

Abstract:As a cross-topic of multi-view learning and multi-label classification, multi-view multi-label classification has gradually gained traction in recent years. The application of multi-view contrastive learning has further facilitated this process, however, the existing multi-view contrastive learning methods crudely separate the so-called negative pair, which largely results in the separation of samples belonging to the same category or similar ones. Besides, plenty of multi-view multi-label learning methods ignore the possible absence of views and labels. To address these issues, in this paper, we propose an incomplete multi-view partial multi-label classification network named RANK. In this network, a label-driven multi-view contrastive learning strategy is proposed to leverage supervised information to preserve the structure within view and perform consistent alignment across views. Furthermore, we break through the view-level weights inherent in existing methods and propose a quality-aware sub-network to dynamically assign quality scores to each view of each sample. The label correlation information is fully utilized in the final multi-label cross-entropy classification loss, effectively improving the discriminative power. Last but not least, our model is not only able to handle complete multi-view multi-label datasets, but also works on datasets with missing instances and labels. Extensive experiments confirm that our RANK outperforms existing state-of-the-art methods.

* Please contact me if you have any questions: liucl1996@163.com

Via

Access Paper or Ask Questions

DICNet: Deep Instance-Level Contrastive Network for Double Incomplete Multi-View Multi-Label Classification

Mar 23, 2023

Chengliang Liu, Jie Wen, Xiaoling Luo, Chao Huang, Zhihao Wu, Yong Xu

Figure 1 for DICNet: Deep Instance-Level Contrastive Network for Double Incomplete Multi-View Multi-Label Classification

Figure 2 for DICNet: Deep Instance-Level Contrastive Network for Double Incomplete Multi-View Multi-Label Classification

Figure 3 for DICNet: Deep Instance-Level Contrastive Network for Double Incomplete Multi-View Multi-Label Classification

Figure 4 for DICNet: Deep Instance-Level Contrastive Network for Double Incomplete Multi-View Multi-Label Classification

Abstract:In recent years, multi-view multi-label learning has aroused extensive research enthusiasm. However, multi-view multi-label data in the real world is commonly incomplete due to the uncertain factors of data collection and manual annotation, which means that not only multi-view features are often missing, and label completeness is also difficult to be satisfied. To deal with the double incomplete multi-view multi-label classification problem, we propose a deep instance-level contrastive network, namely DICNet. Different from conventional methods, our DICNet focuses on leveraging deep neural network to exploit the high-level semantic representations of samples rather than shallow-level features. First, we utilize the stacked autoencoders to build an end-to-end multi-view feature extraction framework to learn the view-specific representations of samples. Furthermore, in order to improve the consensus representation ability, we introduce an incomplete instance-level contrastive learning scheme to guide the encoders to better extract the consensus information of multiple views and use a multi-view weighted fusion module to enhance the discrimination of semantic features. Overall, our DICNet is adept in capturing consistent discriminative representations of multi-view multi-label data and avoiding the negative effects of missing views and missing labels. Extensive experiments performed on five datasets validate that our method outperforms other state-of-the-art methods.

* Accepted to AAAI-2023, code is available at https://github.com/justsmart/DICNet

Via

Access Paper or Ask Questions

Incomplete Multi-View Multi-Label Learning via Label-Guided Masked View- and Category-Aware Transformers

Mar 13, 2023

Chengliang Liu, Jie Wen, Xiaoling Luo, Yong Xu

Figure 1 for Incomplete Multi-View Multi-Label Learning via Label-Guided Masked View- and Category-Aware Transformers

Figure 2 for Incomplete Multi-View Multi-Label Learning via Label-Guided Masked View- and Category-Aware Transformers

Figure 3 for Incomplete Multi-View Multi-Label Learning via Label-Guided Masked View- and Category-Aware Transformers

Figure 4 for Incomplete Multi-View Multi-Label Learning via Label-Guided Masked View- and Category-Aware Transformers

Abstract:As we all know, multi-view data is more expressive than single-view data and multi-label annotation enjoys richer supervision information than single-label, which makes multi-view multi-label learning widely applicable for various pattern recognition tasks. In this complex representation learning problem, three main challenges can be characterized as follows: i) How to learn consistent representations of samples across all views? ii) How to exploit and utilize category correlations of multi-label to guide inference? iii) How to avoid the negative impact resulting from the incompleteness of views or labels? To cope with these problems, we propose a general multi-view multi-label learning framework named label-guided masked view- and category-aware transformers in this paper. First, we design two transformer-style based modules for cross-view features aggregation and multi-label classification, respectively. The former aggregates information from different views in the process of extracting view-specific features, and the latter learns subcategory embedding to improve classification performance. Second, considering the imbalance of expressive power among views, an adaptively weighted view fusion module is proposed to obtain view-consistent embedding features. Third, we impose a label manifold constraint in sample-level representation learning to maximize the utilization of supervised information. Last but not least, all the modules are designed under the premise of incomplete views and labels, which makes our method adaptable to arbitrary multi-view and multi-label data. Extensive experiments on five datasets confirm that our method has clear advantages over other state-of-the-art methods.

* Accepted to AAAI-23

Via

Access Paper or Ask Questions

Localized Sparse Incomplete Multi-view Clustering

Aug 05, 2022

Chengliang Liu, Zhihao Wu, Jie Wen, Chao Huang, Yong Xu

Figure 1 for Localized Sparse Incomplete Multi-view Clustering

Figure 2 for Localized Sparse Incomplete Multi-view Clustering

Figure 3 for Localized Sparse Incomplete Multi-view Clustering

Figure 4 for Localized Sparse Incomplete Multi-view Clustering

Abstract:Incomplete multi-view clustering, which aims to solve the clustering problem on the incomplete multi-view data with partial view missing, has received more and more attention in recent years. Although numerous methods have been developed, most of the methods either cannot flexibly handle the incomplete multi-view data with arbitrary missing views or do not consider the negative factor of information imbalance among views. Moreover, some methods do not fully explore the local structure of all incomplete views. To tackle these problems, this paper proposes a simple but effective method, named localized sparse incomplete multi-view clustering (LSIMVC). Different from the existing methods, LSIMVC intends to learn a sparse and structured consensus latent representation from the incomplete multi-view data by optimizing a sparse regularized and novel graph embedded multi-view matrix factorization model. Specifically, in such a novel model based on the matrix factorization, a l1 norm based sparse constraint is introduced to obtain the sparse low-dimensional individual representations and the sparse consensus representation. Moreover, a novel local graph embedding term is introduced to learn the structured consensus representation. Different from the existing works, our local graph embedding term aggregates the graph embedding task and consensus representation learning task into a concise term. Furthermore, to reduce the imbalance factor of incomplete multi-view learning, an adaptive weighted learning scheme is introduced to LSIMVC. Finally, an efficient optimization strategy is given to solve the optimization problem of our proposed model. Comprehensive experimental results performed on six incomplete multi-view databases verify that the performance of our LSIMVC is superior to the state-of-the-art IMC approaches. The code is available in https://github.com/justsmart/LSIMVC.

* Published in IEEE Transactions on Multimedia (TMM). The code is available in Github https://github.com/justsmart/LSIMVC

Via

Access Paper or Ask Questions