Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yun Zhou

Black-Box Visual Prompt Engineering for Mitigating Object Hallucination in Large Vision Language Models

Apr 30, 2025

Sangmin Woo, Kang Zhou, Yun Zhou, Shuai Wang, Sheng Guan, Haibo Ding, Lin Lee Cheong

Abstract:Large Vision Language Models (LVLMs) often suffer from object hallucination, which undermines their reliability. Surprisingly, we find that simple object-based visual prompting -- overlaying visual cues (e.g., bounding box, circle) on images -- can significantly mitigate such hallucination; however, different visual prompts (VPs) vary in effectiveness. To address this, we propose Black-Box Visual Prompt Engineering (BBVPE), a framework to identify optimal VPs that enhance LVLM responses without needing access to model internals. Our approach employs a pool of candidate VPs and trains a router model to dynamically select the most effective VP for a given input image. This black-box approach is model-agnostic, making it applicable to both open-source and proprietary LVLMs. Evaluations on benchmarks such as POPE and CHAIR demonstrate that BBVPE effectively reduces object hallucination.

* NAACL 2025

Via

Access Paper or Ask Questions

Collaborative LLM Numerical Reasoning with Local Data Protection

Apr 01, 2025

Min Zhang, Yuzhe Lu, Yun Zhou, Panpan Xu, Lin Lee Cheong, Chang-Tien Lu, Haozhu Wang

Abstract:Numerical reasoning over documents, which demands both contextual understanding and logical inference, is challenging for low-capacity local models deployed on computation-constrained devices. Although such complex reasoning queries could be routed to powerful remote models like GPT-4, exposing local data raises significant data leakage concerns. Existing mitigation methods generate problem descriptions or examples for remote assistance. However, the inherent complexity of numerical reasoning hinders the local model from generating logically equivalent queries and accurately inferring answers with remote guidance. In this paper, we present a model collaboration framework with two key innovations: (1) a context-aware synthesis strategy that shifts the query domains while preserving logical consistency; and (2) a tool-based answer reconstruction approach that reuses the remote-generated problem-solving pattern with code snippets. Experimental results demonstrate that our method achieves better reasoning accuracy than solely using local models while providing stronger data protection than fully relying on remote models. Furthermore, our method improves accuracy by 16.2% - 43.6% while reducing data leakage by 2.3% - 44.6% compared to existing data protection approaches.

Via

Access Paper or Ask Questions

A Systematic Survey of Automatic Prompt Optimization Techniques

Feb 24, 2025

Kiran Ramnath, Kang Zhou, Sheng Guan, Soumya Smruti Mishra, Xuan Qi, Zhengyuan Shen, Shuai Wang, Sangmin Woo, Sullam Jeoung, Yawei Wang(+11 more)

Abstract:Since the advent of large language models (LLMs), prompt engineering has been a crucial step for eliciting desired responses for various Natural Language Processing (NLP) tasks. However, prompt engineering remains an impediment for end users due to rapid advances in models, tasks, and associated best practices. To mitigate this, Automatic Prompt Optimization (APO) techniques have recently emerged that use various automated techniques to help improve the performance of LLMs on various tasks. In this paper, we present a comprehensive survey summarizing the current progress and remaining challenges in this field. We provide a formal definition of APO, a 5-part unifying framework, and then proceed to rigorously categorize all relevant works based on their salient features therein. We hope to spur further research guided by our framework.

* 8 main pages, 31 total pages, 1 figure

Via

Access Paper or Ask Questions

Machine Learning for Raman Spectroscopy-based Cyber-Marine Fish Biochemical Composition Analysis

Sep 29, 2024

Yun Zhou, Gang Chen, Bing Xue, Mengjie Zhang, Jeremy S. Rooney, Kirill Lagutin, Andrew MacKenzie, Keith C. Gordon, Daniel P. Killeen

Figure 1 for Machine Learning for Raman Spectroscopy-based Cyber-Marine Fish Biochemical Composition Analysis

Figure 2 for Machine Learning for Raman Spectroscopy-based Cyber-Marine Fish Biochemical Composition Analysis

Figure 3 for Machine Learning for Raman Spectroscopy-based Cyber-Marine Fish Biochemical Composition Analysis

Figure 4 for Machine Learning for Raman Spectroscopy-based Cyber-Marine Fish Biochemical Composition Analysis

Abstract:The rapid and accurate detection of biochemical compositions in fish is a crucial real-world task that facilitates optimal utilization and extraction of high-value products in the seafood industry. Raman spectroscopy provides a promising solution for quickly and non-destructively analyzing the biochemical composition of fish by associating Raman spectra with biochemical reference data using machine learning regression models. This paper investigates different regression models to address this task and proposes a new design of Convolutional Neural Networks (CNNs) for jointly predicting water, protein, and lipids yield. To the best of our knowledge, we are the first to conduct a successful study employing CNNs to analyze the biochemical composition of fish based on a very small Raman spectroscopic dataset. Our approach combines a tailored CNN architecture with the comprehensive data preparation procedure, effectively mitigating the challenges posed by extreme data scarcity. The results demonstrate that our CNN can significantly outperform two state-of-the-art CNN models and multiple traditional machine learning models, paving the way for accurate and automated analysis of fish biochemical composition.

Via

Access Paper or Ask Questions

DAGAM: A Domain Adversarial Graph Attention Model for Subject Independent EEG-Based Emotion Recognition

Feb 27, 2022

Tao Xu, Wang Dang, Jiabao Wang, Yun Zhou

Figure 1 for DAGAM: A Domain Adversarial Graph Attention Model for Subject Independent EEG-Based Emotion Recognition

Figure 2 for DAGAM: A Domain Adversarial Graph Attention Model for Subject Independent EEG-Based Emotion Recognition

Figure 3 for DAGAM: A Domain Adversarial Graph Attention Model for Subject Independent EEG-Based Emotion Recognition

Figure 4 for DAGAM: A Domain Adversarial Graph Attention Model for Subject Independent EEG-Based Emotion Recognition

Abstract:One of the most significant challenges of EEG-based emotion recognition is the cross-subject EEG variations, leading to poor performance and generalizability. This paper proposes a novel EEG-based emotion recognition model called the domain adversarial graph attention model (DAGAM). The basic idea is to generate a graph to model multichannel EEG signals using biological topology. Graph theory can topologically describe and analyze relationships and mutual dependency between channels of EEG. Then, unlike other graph convolutional networks, self-attention pooling is applied to benefit salient EEG feature extraction from the graph, which effectively improves the performance. Finally, after graph pooling, the domain adversarial based on the graph is employed to identify and handle EEG variation across subjects, efficiently reaching good generalizability. We conduct extensive evaluations on two benchmark datasets (SEED and SEED IV) and obtain state-of-the-art results in subject-independent emotion recognition. Our model boosts the SEED accuracy to 92.59% (4.69% improvement) with the lowest standard deviation of 3.21% (2.92% decrements) and SEED IV accuracy to 80.74% (6.90% improvement) with the lowest standard deviation of 4.14% (3.88% decrements) respectively.

* 8 pages, 2 figures

Via

Access Paper or Ask Questions

Rethinking Feature Uncertainty in Stochastic Neural Networks for Adversarial Robustness

Jan 01, 2022

Hao Yang, Min Wang, Zhengfei Yu, Yun Zhou

Figure 1 for Rethinking Feature Uncertainty in Stochastic Neural Networks for Adversarial Robustness

Figure 2 for Rethinking Feature Uncertainty in Stochastic Neural Networks for Adversarial Robustness

Figure 3 for Rethinking Feature Uncertainty in Stochastic Neural Networks for Adversarial Robustness

Figure 4 for Rethinking Feature Uncertainty in Stochastic Neural Networks for Adversarial Robustness

Abstract:It is well-known that deep neural networks (DNNs) have shown remarkable success in many fields. However, when adding an imperceptible magnitude perturbation on the model input, the model performance might get rapid decrease. To address this issue, a randomness technique has been proposed recently, named Stochastic Neural Networks (SNNs). Specifically, SNNs inject randomness into the model to defend against unseen attacks and improve the adversarial robustness. However, existed studies on SNNs mainly focus on injecting fixed or learnable noises to model weights/activations. In this paper, we find that the existed SNNs performances are largely bottlenecked by the feature representation ability. Surprisingly, simply maximizing the variance per dimension of the feature distribution leads to a considerable boost beyond all previous methods, which we named maximize feature distribution variance stochastic neural network (MFDV-SNN). Extensive experiments on well-known white- and black-box attacks show that MFDV-SNN achieves a significant improvement over existing methods, which indicates that it is a simple but effective method to improve model robustness.

Via

Access Paper or Ask Questions

Exploring Common and Individual Characteristics of Students via Matrix Recovering

Oct 23, 2020

Zhen Wang, Ben Teng, Yun Zhou, Hanshuang Tong, Guangtong Liu

Figure 1 for Exploring Common and Individual Characteristics of Students via Matrix Recovering

Figure 2 for Exploring Common and Individual Characteristics of Students via Matrix Recovering

Figure 3 for Exploring Common and Individual Characteristics of Students via Matrix Recovering

Figure 4 for Exploring Common and Individual Characteristics of Students via Matrix Recovering

Abstract:Balancing group teaching and individual mentoring is an important issue in education area. The nature behind this issue is to explore common characteristics shared by multiple students and individual characteristics for each student. Biclustering methods have been proved successful for detecting meaningful patterns with the goal of driving group instructions based on students' characteristics. However, these methods ignore the individual characteristics of students as they only focus on common characteristics of students. In this article, we propose a framework to detect both group characteristics and individual characteristics of students simultaneously. We assume that the characteristics matrix of students' is composed of two parts: one is a low-rank matrix representing the common characteristics of students; the other is a sparse matrix representing individual characteristics of students. Thus, we treat the balancing issue as a matrix recovering problem. The experiment results show the effectiveness of our method. Firstly, it can detect meaningful biclusters that are comparable with the state-of-the-art biclutering algorithms. Secondly, it can identify individual characteristics for each student simultaneously. Both the source code of our algorithm and the real datasets are available upon request.

* 8 pages, 9 figures, Submitted to AAAI 2021

Via

Access Paper or Ask Questions

DiPair: Fast and Accurate Distillation for Trillion-Scale Text Matching and Pair Modeling

Oct 07, 2020

Jiecao Chen, Liu Yang, Karthik Raman, Michael Bendersky, Jung-Jung Yeh, Yun Zhou, Marc Najork, Danyang Cai, Ehsan Emadzadeh

Figure 1 for DiPair: Fast and Accurate Distillation for Trillion-Scale Text Matching and Pair Modeling

Figure 2 for DiPair: Fast and Accurate Distillation for Trillion-Scale Text Matching and Pair Modeling

Figure 3 for DiPair: Fast and Accurate Distillation for Trillion-Scale Text Matching and Pair Modeling

Figure 4 for DiPair: Fast and Accurate Distillation for Trillion-Scale Text Matching and Pair Modeling

Abstract:Pre-trained models like BERT (Devlin et al., 2018) have dominated NLP / IR applications such as single sentence classification, text pair classification, and question answering. However, deploying these models in real systems is highly non-trivial due to their exorbitant computational costs. A common remedy to this is knowledge distillation (Hinton et al., 2015), leading to faster inference. However -- as we show here -- existing works are not optimized for dealing with pairs (or tuples) of texts. Consequently, they are either not scalable or demonstrate subpar performance. In this work, we propose DiPair -- a novel framework for distilling fast and accurate models on text pair tasks. Coupled with an end-to-end training strategy, DiPair is both highly scalable and offers improved quality-speed tradeoffs. Empirical studies conducted on both academic and real-world e-commerce benchmarks demonstrate the efficacy of the proposed approach with speedups of over 350x and minimal quality drop relative to the cross-attention teacher BERT model.

* 13 pages. Accepted to Findings of EMNLP 2020

Via

Access Paper or Ask Questions

HGKT : Introducing Problem Schema with Hierarchical Exercise Graph for Knowledge Tracing

Jul 04, 2020

Hanshuang Tong, Yun Zhou, Zhen Wang

Figure 1 for HGKT : Introducing Problem Schema with Hierarchical Exercise Graph for Knowledge Tracing

Figure 2 for HGKT : Introducing Problem Schema with Hierarchical Exercise Graph for Knowledge Tracing

Figure 3 for HGKT : Introducing Problem Schema with Hierarchical Exercise Graph for Knowledge Tracing

Figure 4 for HGKT : Introducing Problem Schema with Hierarchical Exercise Graph for Knowledge Tracing

Abstract:Knowledge tracing (KT) which aims at predicting learner's knowledge mastery plays an important role in the computer-aided educational system. Given learners' exercise records, a knowledge tracing model can trace their hidden knowledge state dynamically. In recent years, many deep learning models have been applied to tackle the KT task, which has shown promising results. However, they still have limitations. Most existing methods simplify the exercising records as knowledge sequence, which fails to explore rich information existed in exercise texts. Besides, the latent hierarchical graph nature of exercises and knowledge remain unexplored. Thus, in this paper, we propose a hierarchical graph knowledge tracing model framework (HGKT) which could leverage the advantages of hierarchical exercise graph and sequence model to enhance the ability of knowledge tracing. Besides, we introduce the concept of problem schema to better represent a group of similar exercises and propose a hierarchical graph neural network to learn representations of problem schemas. Moreover, in the sequence model, we employ two attention mechanisms to highlight important historical states of students. In the testing stage, we present a K&S diagnosis matrix that could trace the transition of mastery of knowledge and problem schema, which could more easily be applied to different applications. Finally, we conduct extensive experiments to evaluate the model on a large scale real-world dataset. The results prove the effectiveness of our model and the diversity of its application scenarios.

* 9 pages, 9 figures, submission to the 29th ACM International Conference on Information and Knowledge Management (CIKM '20)

Via

Access Paper or Ask Questions

Spatial-Temporal Multi-Cue Network for Continuous Sign Language Recognition

Feb 08, 2020

Hao Zhou, Wengang Zhou, Yun Zhou, Houqiang Li

Figure 1 for Spatial-Temporal Multi-Cue Network for Continuous Sign Language Recognition

Figure 2 for Spatial-Temporal Multi-Cue Network for Continuous Sign Language Recognition

Figure 3 for Spatial-Temporal Multi-Cue Network for Continuous Sign Language Recognition

Figure 4 for Spatial-Temporal Multi-Cue Network for Continuous Sign Language Recognition

Abstract:Despite the recent success of deep learning in continuous sign language recognition (CSLR), deep models typically focus on the most discriminative features, ignoring other potentially non-trivial and informative contents. Such characteristic heavily constrains their capability to learn implicit visual grammars behind the collaboration of different visual cues (i,e., hand shape, facial expression and body posture). By injecting multi-cue learning into neural network design, we propose a spatial-temporal multi-cue (STMC) network to solve the vision-based sequence learning problem. Our STMC network consists of a spatial multi-cue (SMC) module and a temporal multi-cue (TMC) module. The SMC module is dedicated to spatial representation and explicitly decomposes visual features of different cues with the aid of a self-contained pose estimation branch. The TMC module models temporal correlations along two parallel paths, i.e., intra-cue and inter-cue, which aims to preserve the uniqueness and explore the collaboration of multiple cues. Finally, we design a joint optimization strategy to achieve the end-to-end sequence learning of the STMC network. To validate the effectiveness, we perform experiments on three large-scale CSLR benchmarks: PHOENIX-2014, CSL and PHOENIX-2014-T. Experimental results demonstrate that the proposed method achieves new state-of-the-art performance on all three benchmarks.

* Accepted as an oral presentation paper at AAAI 2020. (To appear in Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence)

Via

Access Paper or Ask Questions