Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiaonan Luo

LGM-Pose: A Lightweight Global Modeling Network for Real-time Human Pose Estimation

Jun 05, 2025

Biao Guo, Fangmin Guo, Guibo Luo, Xiaonan Luo, Feng Zhang

Abstract:Most of the current top-down multi-person pose estimation lightweight methods are based on multi-branch parallel pure CNN network architecture, which often struggle to capture the global context required for detecting semantically complex keypoints and are hindered by high latency due to their intricate and redundant structures. In this article, an approximate single-branch lightweight global modeling network (LGM-Pose) is proposed to address these challenges. In the network, a lightweight MobileViM Block is designed with a proposed Lightweight Attentional Representation Module (LARM), which integrates information within and between patches using the Non-Parametric Transformation Operation(NPT-Op) to extract global information. Additionally, a novel Shuffle-Integrated Fusion Module (SFusion) is introduced to effectively integrate multi-scale information, mitigating performance degradation often observed in single-branch structures. Experimental evaluations on the COCO and MPII datasets demonstrate that our approach not only reduces the number of parameters compared to existing mainstream lightweight methods but also achieves superior performance and faster processing speeds.

Via

Access Paper or Ask Questions

AdaReasoner: Adaptive Reasoning Enables More Flexible Thinking

May 22, 2025

Xiangqi Wang, Yue Huang, Yanbo Wang, Xiaonan Luo, Kehan Guo, Yujun Zhou, Xiangliang Zhang

Abstract:LLMs often need effective configurations, like temperature and reasoning steps, to handle tasks requiring sophisticated reasoning and problem-solving, ranging from joke generation to mathematical reasoning. Existing prompting approaches usually adopt general-purpose, fixed configurations that work 'well enough' across tasks but seldom achieve task-specific optimality. To address this gap, we introduce AdaReasoner, an LLM-agnostic plugin designed for any LLM to automate adaptive reasoning configurations for tasks requiring different types of thinking. AdaReasoner is trained using a reinforcement learning (RL) framework, combining a factorized action space with a targeted exploration strategy, along with a pretrained reward model to optimize the policy model for reasoning configurations with only a few-shot guide. AdaReasoner is backed by theoretical guarantees and experiments of fast convergence and a sublinear policy gap. Across six different LLMs and a variety of reasoning tasks, it consistently outperforms standard baselines, preserves out-of-distribution robustness, and yield gains on knowledge-intensive tasks through tailored prompts.

Via

Access Paper or Ask Questions

MorphText: Deep Morphology Regularized Arbitrary-shape Scene Text Detection

Apr 26, 2024

Chengpei Xu, Wenjing Jia, Ruomei Wang, Xiaonan Luo, Xiangjian He

Figure 1 for MorphText: Deep Morphology Regularized Arbitrary-shape Scene Text Detection

Figure 2 for MorphText: Deep Morphology Regularized Arbitrary-shape Scene Text Detection

Figure 3 for MorphText: Deep Morphology Regularized Arbitrary-shape Scene Text Detection

Figure 4 for MorphText: Deep Morphology Regularized Arbitrary-shape Scene Text Detection

Abstract:Bottom-up text detection methods play an important role in arbitrary-shape scene text detection but there are two restrictions preventing them from achieving their great potential, i.e., 1) the accumulation of false text segment detections, which affects subsequent processing, and 2) the difficulty of building reliable connections between text segments. Targeting these two problems, we propose a novel approach, named ``MorphText", to capture the regularity of texts by embedding deep morphology for arbitrary-shape text detection. Towards this end, two deep morphological modules are designed to regularize text segments and determine the linkage between them. First, a Deep Morphological Opening (DMOP) module is constructed to remove false text segment detections generated in the feature extraction process. Then, a Deep Morphological Closing (DMCL) module is proposed to allow text instances of various shapes to stretch their morphology along their most significant orientation while deriving their connections. Extensive experiments conducted on four challenging benchmark datasets (CTW1500, Total-Text, MSRA-TD500 and ICDAR2017) demonstrate that our proposed MorphText outperforms both top-down and bottom-up state-of-the-art arbitrary-shape scene text detection approaches.

* Accepted by Transaction on Multimedia

Via

Access Paper or Ask Questions

Gland segmentation via dual encoders and boundary-enhanced attention

Jan 29, 2024

Huadeng Wang, Jiejiang Yu, Bingbing Li, Xipeng Pan, Zhenbing Liu, Rushi Lan, Xiaonan Luo

Figure 1 for Gland segmentation via dual encoders and boundary-enhanced attention

Figure 2 for Gland segmentation via dual encoders and boundary-enhanced attention

Figure 3 for Gland segmentation via dual encoders and boundary-enhanced attention

Figure 4 for Gland segmentation via dual encoders and boundary-enhanced attention

Abstract:Accurate and automated gland segmentation on pathological images can assist pathologists in diagnosing the malignancy of colorectal adenocarcinoma. However, due to various gland shapes, severe deformation of malignant glands, and overlapping adhesions between glands. Gland segmentation has always been very challenging. To address these problems, we propose a DEA model. This model consists of two branches: the backbone encoding and decoding network and the local semantic extraction network. The backbone encoding and decoding network extracts advanced Semantic features, uses the proposed feature decoder to restore feature space information, and then enhances the boundary features of the gland through boundary enhancement attention. The local semantic extraction network uses the pre-trained DeepLabv3+ as a Local semantic-guided encoder to realize the extraction of edge features. Experimental results on two public datasets, GlaS and CRAG, confirm that the performance of our method is better than other gland segmentation methods.

* accepted for IEEE ICASSP 2024

Via

Access Paper or Ask Questions

A novel dataset and a two-stage mitosis nuclei detection method based on hybrid anchor branch

Jan 18, 2023

Huadeng Wang, Hao Xu, Bingbing Li, Xipeng Pan, Lingqi Zeng, Rushi Lan, Xiaonan Luo

Figure 1 for A novel dataset and a two-stage mitosis nuclei detection method based on hybrid anchor branch

Figure 2 for A novel dataset and a two-stage mitosis nuclei detection method based on hybrid anchor branch

Figure 3 for A novel dataset and a two-stage mitosis nuclei detection method based on hybrid anchor branch

Figure 4 for A novel dataset and a two-stage mitosis nuclei detection method based on hybrid anchor branch

Abstract:Mitosis detection is one of the challenging problems in computational pathology, and mitotic count is an important index of cancer grading for pathologists. However, current counts of mitotic nuclei rely on pathologists looking microscopically at the number of mitotic nuclei in hot spots, which is subjective and time-consuming. In this paper, we propose a two-stage cascaded network, named FoCasNet, for mitosis detection. In the first stage, a detection network named M_det is proposed to detect as many mitoses as possible. In the second stage, a classification network M_class is proposed to refine the results of the first stage. In addition, the attention mechanism, normalization method, and hybrid anchor branch classification subnet are introduced to improve the overall detection performance. Our method achieves the current highest F1-score of 0.888 on the public dataset ICPR 2012. We also evaluated our method on the GZMH dataset released by our research team for the first time and reached the highest F1-score of 0.563, which is also better than multiple classic detection networks widely used at present. It confirmed the effectiveness and generalization of our method. The code will be available at: https://github.com/antifen/mitosis-nuclei-detection.

* 22 pages,10 figures, 8 tables

Via

Access Paper or Ask Questions

A Novel Dataset and a Deep Learning Method for Mitosis Nuclei Segmentation and Classification

Dec 27, 2022

Huadeng Wang, Zhipeng Liu, Rushi Lan, Zhenbing Liu, Xiaonan Luo, Xipeng Pan, Bingbing Li

Figure 1 for A Novel Dataset and a Deep Learning Method for Mitosis Nuclei Segmentation and Classification

Figure 2 for A Novel Dataset and a Deep Learning Method for Mitosis Nuclei Segmentation and Classification

Figure 3 for A Novel Dataset and a Deep Learning Method for Mitosis Nuclei Segmentation and Classification

Figure 4 for A Novel Dataset and a Deep Learning Method for Mitosis Nuclei Segmentation and Classification

Abstract:Mitosis nuclei count is one of the important indicators for the pathological diagnosis of breast cancer. The manual annotation needs experienced pathologists, which is very time-consuming and inefficient. With the development of deep learning methods, some models with good performance have emerged, but the generalization ability should be further strengthened. In this paper, we propose a two-stage mitosis segmentation and classification method, named SCMitosis. Firstly, the segmentation performance with a high recall rate is achieved by the proposed depthwise separable convolution residual block and channel-spatial attention gate. Then, a classification network is cascaded to further improve the detection performance of mitosis nuclei. The proposed model is verified on the ICPR 2012 dataset, and the highest F-score value of 0.8687 is obtained compared with the current state-of-the-art algorithms. In addition, the model also achieves good performance on GZMH dataset, which is prepared by our group and will be firstly released with the publication of this paper. The code will be available at: https://github.com/antifen/mitosis-nuclei-segmentation.

* 19 pages,11 figures, 4 tables

Via

Access Paper or Ask Questions

Binary Representation via Jointly Personalized Sparse Hashing

Aug 31, 2022

Xiaoqin Wang, Chen Chen, Rushi Lan, Licheng Liu, Zhenbing Liu, Huiyu Zhou, Xiaonan Luo

Figure 1 for Binary Representation via Jointly Personalized Sparse Hashing

Figure 2 for Binary Representation via Jointly Personalized Sparse Hashing

Figure 3 for Binary Representation via Jointly Personalized Sparse Hashing

Figure 4 for Binary Representation via Jointly Personalized Sparse Hashing

Abstract:Unsupervised hashing has attracted much attention for binary representation learning due to the requirement of economical storage and efficiency of binary codes. It aims to encode high-dimensional features in the Hamming space with similarity preservation between instances. However, most existing methods learn hash functions in manifold-based approaches. Those methods capture the local geometric structures (i.e., pairwise relationships) of data, and lack satisfactory performance in dealing with real-world scenarios that produce similar features (e.g. color and shape) with different semantic information. To address this challenge, in this work, we propose an effective unsupervised method, namely Jointly Personalized Sparse Hashing (JPSH), for binary representation learning. To be specific, firstly, we propose a novel personalized hashing module, i.e., Personalized Sparse Hashing (PSH). Different personalized subspaces are constructed to reflect category-specific attributes for different clusters, adaptively mapping instances within the same cluster to the same Hamming space. In addition, we deploy sparse constraints for different personalized subspaces to select important features. We also collect the strengths of the other clusters to build the PSH module with avoiding over-fitting. Then, to simultaneously preserve semantic and pairwise similarities in our JPSH, we incorporate the PSH and manifold-based hash learning into the seamless formulation. As such, JPSH not only distinguishes the instances from different clusters, but also preserves local neighborhood structures within the cluster. Finally, an alternating optimization algorithm is adopted to iteratively capture analytical solutions of the JPSH model. Extensive experiments on four benchmark datasets verify that the JPSH outperforms several hashing algorithms on the similarity search task.

Via

Access Paper or Ask Questions

Neural Points: Point Cloud Representation with Neural Fields

Dec 13, 2021

Wanquan Feng, Jin Li, Hongrui Cai, Xiaonan Luo, Juyong Zhang

Figure 1 for Neural Points: Point Cloud Representation with Neural Fields

Figure 2 for Neural Points: Point Cloud Representation with Neural Fields

Figure 3 for Neural Points: Point Cloud Representation with Neural Fields

Figure 4 for Neural Points: Point Cloud Representation with Neural Fields

Abstract:In this paper, we propose \emph{Neural Points}, a novel point cloud representation. Unlike traditional point cloud representation where each point only represents a position or a local plane in the 3D space, each point in Neural Points represents a local continuous geometric shape via neural fields. Therefore, Neural Points can express much more complex details and thus have a stronger representation ability. Neural Points is trained with high-resolution surface containing rich geometric details, such that the trained model has enough expression ability for various shapes. Specifically, we extract deep local features on the points and construct neural fields through the local isomorphism between the 2D parametric domain and the 3D local patch. In the final, local neural fields are integrated together to form the global surface. Experimental results show that Neural Points has powerful representation ability and demonstrate excellent robustness and generalization ability. With Neural Points, we can resample point cloud with arbitrary resolutions, and it outperforms state-of-the-art point cloud upsampling methods by a large margin.

* Project page: https://wanquanf.github.io/NeuralPoints.html

Via

Access Paper or Ask Questions

GAN for Vision, KG for Relation: a Two-stage Deep Network for Zero-shot Action Recognition

May 25, 2021

Bin Sun, Dehui Kong, Shaofan Wang, Jinghua Li, Baocai Yin, Xiaonan Luo

Figure 1 for GAN for Vision, KG for Relation: a Two-stage Deep Network for Zero-shot Action Recognition

Figure 2 for GAN for Vision, KG for Relation: a Two-stage Deep Network for Zero-shot Action Recognition

Figure 3 for GAN for Vision, KG for Relation: a Two-stage Deep Network for Zero-shot Action Recognition

Figure 4 for GAN for Vision, KG for Relation: a Two-stage Deep Network for Zero-shot Action Recognition

Abstract:Zero-shot action recognition can recognize samples of unseen classes that are unavailable in training by exploring common latent semantic representation in samples. However, most methods neglected the connotative relation and extensional relation between the action classes, which leads to the poor generalization ability of the zero-shot learning. Furthermore, the learned classifier incline to predict the samples of seen class, which leads to poor classification performance. To solve the above problems, we propose a two-stage deep neural network for zero-shot action recognition, which consists of a feature generation sub-network serving as the sampling stage and a graph attention sub-network serving as the classification stage. In the sampling stage, we utilize a generative adversarial networks (GAN) trained by action features and word vectors of seen classes to synthesize the action features of unseen classes, which can balance the training sample data of seen classes and unseen classes. In the classification stage, we construct a knowledge graph (KG) based on the relationship between word vectors of action classes and related objects, and propose a graph convolution network (GCN) based on attention mechanism, which dynamically updates the relationship between action classes and objects, and enhances the generalization ability of zero-shot learning. In both stages, we all use word vectors as bridges for feature generation and classifier generalization from seen classes to unseen classes. We compare our method with state-of-the-art methods on UCF101 and HMDB51 datasets. Experimental results show that our proposed method improves the classification performance of the trained classifier and achieves higher accuracy.

* 19 pages, 7 figures

Via

Access Paper or Ask Questions

Neural Task Planning with And-Or Graph Representations

Aug 25, 2018

Tianshui Chen, Riquan Chen, Lin Nie, Xiaonan Luo, Xiaobai Liu, Liang Lin

Figure 1 for Neural Task Planning with And-Or Graph Representations

Figure 2 for Neural Task Planning with And-Or Graph Representations

Figure 3 for Neural Task Planning with And-Or Graph Representations

Figure 4 for Neural Task Planning with And-Or Graph Representations

Abstract:This paper focuses on semantic task planning, i.e., predicting a sequence of actions toward accomplishing a specific task under a certain scene, which is a new problem in computer vision research. The primary challenges are how to model task-specific knowledge and how to integrate this knowledge into the learning procedure. In this work, we propose training a recurrent long short-term memory (LSTM) network to address this problem, i.e., taking a scene image (including pre-located objects) and the specified task as input and recurrently predicting action sequences. However, training such a network generally requires large numbers of annotated samples to cover the semantic space (e.g., diverse action decomposition and ordering). To overcome this issue, we introduce a knowledge and-or graph (AOG) for task description, which hierarchically represents a task as atomic actions. With this AOG representation, we can produce many valid samples (i.e., action sequences according to common sense) by training another auxiliary LSTM network with a small set of annotated samples. Furthermore, these generated samples (i.e., task-oriented action sequences) effectively facilitate training of the model for semantic task planning. In our experiments, we create a new dataset that contains diverse daily tasks and extensively evaluate the effectiveness of our approach.

* Submitted to TMM, under minor revision. arXiv admin note: text overlap with arXiv:1707.04677

Via

Access Paper or Ask Questions