Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jingling Yuan

Expanding Zero-Shot Object Counting with Rich Prompts

May 21, 2025

Huilin Zhu, Senyao Li, Jingling Yuan, Zhengwei Yang, Yu Guo, Wenxuan Liu, Xian Zhong, Shengfeng He

Abstract:Expanding pre-trained zero-shot counting models to handle unseen categories requires more than simply adding new prompts, as this approach does not achieve the necessary alignment between text and visual features for accurate counting. We introduce RichCount, the first framework to address these limitations, employing a two-stage training strategy that enhances text encoding and strengthens the model's association with objects in images. RichCount improves zero-shot counting for unseen categories through two key objectives: (1) enriching text features with a feed-forward network and adapter trained on text-image similarity, thereby creating robust, aligned representations; and (2) applying this refined encoder to counting tasks, enabling effective generalization across diverse prompts and complex images. In this manner, RichCount goes beyond simple prompt expansion to establish meaningful feature alignment that supports accurate counting across novel categories. Extensive experiments on three benchmark datasets demonstrate the effectiveness of RichCount, achieving state-of-the-art performance in zero-shot counting and significantly enhancing generalization to unseen categories in open-world scenarios.

Via

Access Paper or Ask Questions

Robust Machine Unlearning for Quantized Neural Networks via Adaptive Gradient Reweighting with Similar Labels

Mar 18, 2025

Yujia Tong, Yuze Wang, Jingling Yuan, Chuang Hu

Abstract:Model quantization enables efficient deployment of deep neural networks on edge devices through low-bit parameter representation, yet raises critical challenges for implementing machine unlearning (MU) under data privacy regulations. Existing MU methods designed for full-precision models fail to address two fundamental limitations in quantized networks: 1) Noise amplification from label mismatch during data processing, and 2) Gradient imbalance between forgotten and retained data during training. These issues are exacerbated by quantized models' constrained parameter space and discrete optimization. We propose Q-MUL, the first dedicated unlearning framework for quantized models. Our method introduces two key innovations: 1) Similar Labels assignment replaces random labels with semantically consistent alternatives to minimize noise injection, and 2) Adaptive Gradient Reweighting dynamically aligns parameter update contributions from forgotten and retained data. Through systematic analysis of quantized model vulnerabilities, we establish theoretical foundations for these mechanisms. Extensive evaluations on benchmark datasets demonstrate Q-MUL's superiority over existing approaches.

* 15 pages, 4 figures

Via

Access Paper or Ask Questions

DenseTrack: Drone-based Crowd Tracking via Density-aware Motion-appearance Synergy

Jul 26, 2024

Yi Lei, Huilin Zhu, Jingling Yuan, Guangli Xiang, Xian Zhong, Shengfeng He

Figure 1 for DenseTrack: Drone-based Crowd Tracking via Density-aware Motion-appearance Synergy

Figure 2 for DenseTrack: Drone-based Crowd Tracking via Density-aware Motion-appearance Synergy

Figure 3 for DenseTrack: Drone-based Crowd Tracking via Density-aware Motion-appearance Synergy

Figure 4 for DenseTrack: Drone-based Crowd Tracking via Density-aware Motion-appearance Synergy

Abstract:Drone-based crowd tracking faces difficulties in accurately identifying and monitoring objects from an aerial perspective, largely due to their small size and close proximity to each other, which complicates both localization and tracking. To address these challenges, we present the Density-aware Tracking (DenseTrack) framework. DenseTrack capitalizes on crowd counting to precisely determine object locations, blending visual and motion cues to improve the tracking of small-scale objects. It specifically addresses the problem of cross-frame motion to enhance tracking accuracy and dependability. DenseTrack employs crowd density estimates as anchors for exact object localization within video frames. These estimates are merged with motion and position information from the tracking network, with motion offsets serving as key tracking cues. Moreover, DenseTrack enhances the ability to distinguish small-scale objects using insights from the visual-language model, integrating appearance with motion cues. The framework utilizes the Hungarian algorithm to ensure the accurate matching of individuals across frames. Demonstrated on DroneCrowd dataset, our approach exhibits superior performance, confirming its effectiveness in scenarios captured by drones.

Via

Access Paper or Ask Questions

Zero-shot Object Counting with Good Exemplars

Jul 09, 2024

Huilin Zhu, Jingling Yuan, Zhengwei Yang, Yu Guo, Zheng Wang, Xian Zhong, Shengfeng He

Figure 1 for Zero-shot Object Counting with Good Exemplars

Figure 2 for Zero-shot Object Counting with Good Exemplars

Figure 3 for Zero-shot Object Counting with Good Exemplars

Figure 4 for Zero-shot Object Counting with Good Exemplars

Abstract:Zero-shot object counting (ZOC) aims to enumerate objects in images using only the names of object classes during testing, without the need for manual annotations. However, a critical challenge in current ZOC methods lies in their inability to identify high-quality exemplars effectively. This deficiency hampers scalability across diverse classes and undermines the development of strong visual associations between the identified classes and image content. To this end, we propose the Visual Association-based Zero-shot Object Counting (VA-Count) framework. VA-Count consists of an Exemplar Enhancement Module (EEM) and a Noise Suppression Module (NSM) that synergistically refine the process of class exemplar identification while minimizing the consequences of incorrect object identification. The EEM utilizes advanced vision-language pretaining models to discover potential exemplars, ensuring the framework's adaptability to various classes. Meanwhile, the NSM employs contrastive learning to differentiate between optimal and suboptimal exemplar pairs, reducing the negative effects of erroneous exemplars. VA-Count demonstrates its effectiveness and scalability in zero-shot contexts with superior performance on two object counting datasets.

Via

Access Paper or Ask Questions

Primary and Secondary Factor Consistency as Domain Knowledge to Guide Happiness Computing in Online Assessment

Feb 17, 2024

Xiaohua Wu, Lin Li, Xiaohui Tao, Frank Xing, Jingling Yuan

Figure 1 for Primary and Secondary Factor Consistency as Domain Knowledge to Guide Happiness Computing in Online Assessment

Figure 2 for Primary and Secondary Factor Consistency as Domain Knowledge to Guide Happiness Computing in Online Assessment

Figure 3 for Primary and Secondary Factor Consistency as Domain Knowledge to Guide Happiness Computing in Online Assessment

Figure 4 for Primary and Secondary Factor Consistency as Domain Knowledge to Guide Happiness Computing in Online Assessment

Abstract:Happiness computing based on large-scale online web data and machine learning methods is an emerging research topic that underpins a range of issues, from personal growth to social stability. Many advanced Machine Learning (ML) models with explanations are used to compute the happiness online assessment while maintaining high accuracy of results. However, domain knowledge constraints, such as the primary and secondary relations of happiness factors, are absent from these models, which limits the association between computing results and the right reasons for why they occurred. This article attempts to provide new insights into the explanation consistency from an empirical study perspective. Then we study how to represent and introduce domain knowledge constraints to make ML models more trustworthy. We achieve this through: (1) proving that multiple prediction models with additive factor attributions will have the desirable property of primary and secondary relations consistency, and (2) showing that factor relations with quantity can be represented as an importance distribution for encoding domain knowledge. Factor explanation difference is penalized by the Kullback-Leibler divergence-based loss among computing models. Experimental results using two online web datasets show that domain knowledge of stable factor relations exists. Using this knowledge not only improves happiness computing accuracy but also reveals more significative happiness factors for assisting decisions well.

* 12 pages

Via

Access Paper or Ask Questions

DAOT: Domain-Agnostically Aligned Optimal Transport for Domain-Adaptive Crowd Counting

Aug 10, 2023

Huilin Zhu, Jingling Yuan, Xian Zhong, Zhengwei Yang, Zheng Wang, Shengfeng He

Figure 1 for DAOT: Domain-Agnostically Aligned Optimal Transport for Domain-Adaptive Crowd Counting

Figure 2 for DAOT: Domain-Agnostically Aligned Optimal Transport for Domain-Adaptive Crowd Counting

Figure 3 for DAOT: Domain-Agnostically Aligned Optimal Transport for Domain-Adaptive Crowd Counting

Figure 4 for DAOT: Domain-Agnostically Aligned Optimal Transport for Domain-Adaptive Crowd Counting

Abstract:Domain adaptation is commonly employed in crowd counting to bridge the domain gaps between different datasets. However, existing domain adaptation methods tend to focus on inter-dataset differences while overlooking the intra-differences within the same dataset, leading to additional learning ambiguities. These domain-agnostic factors, e.g., density, surveillance perspective, and scale, can cause significant in-domain variations, and the misalignment of these factors across domains can lead to a drop in performance in cross-domain crowd counting. To address this issue, we propose a Domain-agnostically Aligned Optimal Transport (DAOT) strategy that aligns domain-agnostic factors between domains. The DAOT consists of three steps. First, individual-level differences in domain-agnostic factors are measured using structural similarity (SSIM). Second, the optimal transfer (OT) strategy is employed to smooth out these differences and find the optimal domain-to-domain misalignment, with outlier individuals removed via a virtual "dustbin" column. Third, knowledge is transferred based on the aligned domain-agnostic factors, and the model is retrained for domain adaptation to bridge the gap across domains. We conduct extensive experiments on five standard crowd-counting benchmarks and demonstrate that the proposed method has strong generalizability across diverse datasets. Our code will be available at: https://github.com/HopooLinZ/DAOT/.

* 11 pages, 12 figures, 5 tables

Via

Access Paper or Ask Questions

MealRec: A Meal Recommendation Dataset

May 24, 2022

Ming Li, Lin Li, Qing Xie, Jingling Yuan, Xiaohui Tao

Figure 1 for MealRec: A Meal Recommendation Dataset

Figure 2 for MealRec: A Meal Recommendation Dataset

Figure 3 for MealRec: A Meal Recommendation Dataset

Figure 4 for MealRec: A Meal Recommendation Dataset

Abstract:Bundle recommendation systems aim to recommend a bundle of items for a user to consider as a whole. They have become a norm in modern life and have been applied to many real-world settings, such as product bundle recommendation, music playlist recommendation and travel package recommendation. However, compared to studies of bundle recommendation approaches in areas such as online shopping and digital music services, research on meal recommendations for restaurants in the hospitality industry has made limited progress, due largely to the lack of high-quality benchmark datasets. A publicly available dataset specialising in meal recommendation research for the research community is in urgent demand. In this paper, we introduce a meal recommendation dataset (MealRec) that aims to facilitate future research. MealRec is constructed from the user review records of Allrecipe.com, covering 1,500+ users, 7,200+ recipes and 3,800+ meals. Each recipe is described with rich information, such as ingredients, instructions, pictures, category and tags, etc; and each meal is three-course, consisting of an appetizer, a main dish and a dessert. Furthermore, we propose a category-constrained meal recommendation model that is evaluated through comparative experiments with several state-of-the-art bundle recommendation methods on MealRec. Experimental results confirm the superiority of our model and demonstrate that MealRec is a promising testbed for meal recommendation related research. The MealRec dataset and the source code of our proposed model are available at https://github.com/WUT-IDEA/MealRec for access and reproducibility.

Via

Access Paper or Ask Questions

Deep neural network-based classification model for Sentiment Analysis

Jul 03, 2019

Donghang Pan, Jingling Yuan, Lin Li, Deming Sheng

Figure 1 for Deep neural network-based classification model for Sentiment Analysis

Figure 2 for Deep neural network-based classification model for Sentiment Analysis

Figure 3 for Deep neural network-based classification model for Sentiment Analysis

Figure 4 for Deep neural network-based classification model for Sentiment Analysis

Abstract:The growing prosperity of social networks has brought great challenges to the sentimental tendency mining of users. As more and more researchers pay attention to the sentimental tendency of online users, rich research results have been obtained based on the sentiment classification of explicit texts. However, research on the implicit sentiment of users is still in its infancy. Aiming at the difficulty of implicit sentiment classification, a research on implicit sentiment classification model based on deep neural network is carried out. Classification models based on DNN, LSTM, Bi-LSTM and CNN were established to judge the tendency of the user's implicit sentiment text. Based on the Bi-LSTM model, the classification model of word-level attention mechanism is studied. The experimental results on the public dataset show that the established LSTM series classification model and CNN classification model can achieve good sentiment classification effect, and the classification effect is significantly better than the DNN model. The Bi-LSTM based attention mechanism classification model obtained the optimal R value in the positive category identification.

Via

Access Paper or Ask Questions

GPU based Parallel Optimization for Real Time Panoramic Video Stitching

Oct 22, 2018

Chengyao Du, Jingling Yuan, Jiansheng Dong, Lin Li, Mincheng Chen, Tao Li

Figure 1 for GPU based Parallel Optimization for Real Time Panoramic Video Stitching

Figure 2 for GPU based Parallel Optimization for Real Time Panoramic Video Stitching

Figure 3 for GPU based Parallel Optimization for Real Time Panoramic Video Stitching

Figure 4 for GPU based Parallel Optimization for Real Time Panoramic Video Stitching

Abstract:Panoramic video is a sort of video recorded at the same point of view to record the full scene. With the development of video surveillance and the requirement for 3D converged video surveillance in smart cities, CPU and GPU are required to possess strong processing abilities to make panoramic video. The traditional panoramic products depend on post processing, which results in high power consumption, low stability and unsatisfying performance in real time. In order to solve these problems,we propose a real-time panoramic video stitching framework.The framework we propose mainly consists of three algorithms, LORB image feature extraction algorithm, feature point matching algorithm based on LSH and GPU parallel video stitching algorithm based on CUDA.The experiment results show that the algorithm mentioned can improve the performance in the stages of feature extraction of images stitching and matching, the running speed of which is 11 times than that of the traditional ORB algorithm and 639 times than that of the traditional SIFT algorithm. Based on analyzing the GPU resources occupancy rate of each resolution image stitching, we further propose a stream parallel strategy to maximize the utilization of GPU resources. Compared with the L-ORB algorithm, the efficiency of this strategy is improved by 1.6-2.5 times, and it can make full use of GPU resources. The performance of the system accomplished in the paper is 29.2 times than that of the former embedded one, while the power dissipation is reduced to 10W.

* under review for Pattern Recognition Letters

Via

Access Paper or Ask Questions