Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shasha Guo

VisualSimpleQA: A Benchmark for Decoupled Evaluation of Large Vision-Language Models in Fact-Seeking Question Answering

Mar 09, 2025

Yanling Wang, Yihan Zhao, Xiaodong Chen, Shasha Guo, Lixin Liu, Haoyang Li, Yong Xiao, Jing Zhang, Qi Li, Ke Xu

Abstract:Large vision-language models (LVLMs) have demonstrated remarkable achievements, yet the generation of non-factual responses remains prevalent in fact-seeking question answering (QA). Current multimodal fact-seeking benchmarks primarily focus on comparing model outputs to ground truth answers, providing limited insights into the performance of modality-specific modules. To bridge this gap, we introduce VisualSimpleQA, a multimodal fact-seeking benchmark with two key features. First, it enables streamlined and decoupled evaluation of LVLMs in visual and linguistic modalities. Second, it incorporates well-defined difficulty criteria to guide human annotation and facilitates the extraction of a challenging subset, VisualSimpleQA-hard. Experiments on 15 LVLMs show that even state-of-the-art models such as GPT-4o achieve merely 60%+ correctness in multimodal fact-seeking QA on VisualSimpleQA and 30%+ on VisualSimpleQA-hard. Furthermore, the decoupled evaluation across these models highlights substantial opportunities for improvement in both visual and linguistic modules. The dataset is available at https://huggingface.co/datasets/WYLing/VisualSimpleQA.

Via

Access Paper or Ask Questions

PCQPR: Proactive Conversational Question Planning with Reflection

Oct 02, 2024

Shasha Guo, Lizi Liao, Jing Zhang, Cuiping Li, Hong Chen

Figure 1 for PCQPR: Proactive Conversational Question Planning with Reflection

Figure 2 for PCQPR: Proactive Conversational Question Planning with Reflection

Figure 3 for PCQPR: Proactive Conversational Question Planning with Reflection

Figure 4 for PCQPR: Proactive Conversational Question Planning with Reflection

Abstract:Conversational Question Generation (CQG) enhances the interactivity of conversational question-answering systems in fields such as education, customer service, and entertainment. However, traditional CQG, focusing primarily on the immediate context, lacks the conversational foresight necessary to guide conversations toward specified conclusions. This limitation significantly restricts their ability to achieve conclusion-oriented conversational outcomes. In this work, we redefine the CQG task as Conclusion-driven Conversational Question Generation (CCQG) by focusing on proactivity, not merely reacting to the unfolding conversation but actively steering it towards a conclusion-oriented question-answer pair. To address this, we propose a novel approach, called Proactive Conversational Question Planning with self-Refining (PCQPR). Concretely, by integrating a planning algorithm inspired by Monte Carlo Tree Search (MCTS) with the analytical capabilities of large language models (LLMs), PCQPR predicts future conversation turns and continuously refines its questioning strategies. This iterative self-refining mechanism ensures the generation of contextually relevant questions strategically devised to reach a specified outcome. Our extensive evaluations demonstrate that PCQPR significantly surpasses existing CQG methods, marking a paradigm shift towards conclusion-oriented conversational question-answering systems.

* Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
* Accepted by EMNLP 2024 Main

Via

Access Paper or Ask Questions

A Label-Free and Non-Monotonic Metric for Evaluating Denoising in Event Cameras

Jun 13, 2024

Chenyang Shi, Shasha Guo, Boyi Wei, Hanxiao Liu, Yibo Zhang, Ningfang Song, Jing Jin

Figure 1 for A Label-Free and Non-Monotonic Metric for Evaluating Denoising in Event Cameras

Figure 2 for A Label-Free and Non-Monotonic Metric for Evaluating Denoising in Event Cameras

Figure 3 for A Label-Free and Non-Monotonic Metric for Evaluating Denoising in Event Cameras

Figure 4 for A Label-Free and Non-Monotonic Metric for Evaluating Denoising in Event Cameras

Abstract:Event cameras are renowned for their high efficiency due to outputting a sparse, asynchronous stream of events. However, they are plagued by noisy events, especially in low light conditions. Denoising is an essential task for event cameras, but evaluating denoising performance is challenging. Label-dependent denoising metrics involve artificially adding noise to clean sequences, complicating evaluations. Moreover, the majority of these metrics are monotonic, which can inflate scores by removing substantial noise and valid events. To overcome these limitations, we propose the first label-free and non-monotonic evaluation metric, the area of the continuous contrast curve (AOCC), which utilizes the area enclosed by event frame contrast curves across different time intervals. This metric is inspired by how events capture the edge contours of scenes or objects with high temporal resolution. An effective denoising method removes noise without eliminating these edge-contour events, thus preserving the contrast of event frames. Consequently, contrast across various time ranges serves as a metric to assess denoising effectiveness. As the time interval lengthens, the curve will initially rise and then fall. The proposed metric is validated through both theoretical and experimental evidence.

Via

Access Paper or Ask Questions

SGSH: Stimulate Large Language Models with Skeleton Heuristics for Knowledge Base Question Generation

Apr 02, 2024

Shasha Guo, Lizi Liao, Jing Zhang, Yanling Wang, Cuiping Li, Hong Chen

Figure 1 for SGSH: Stimulate Large Language Models with Skeleton Heuristics for Knowledge Base Question Generation

Figure 2 for SGSH: Stimulate Large Language Models with Skeleton Heuristics for Knowledge Base Question Generation

Figure 3 for SGSH: Stimulate Large Language Models with Skeleton Heuristics for Knowledge Base Question Generation

Figure 4 for SGSH: Stimulate Large Language Models with Skeleton Heuristics for Knowledge Base Question Generation

Abstract:Knowledge base question generation (KBQG) aims to generate natural language questions from a set of triplet facts extracted from KB. Existing methods have significantly boosted the performance of KBQG via pre-trained language models (PLMs) thanks to the richly endowed semantic knowledge. With the advance of pre-training techniques, large language models (LLMs) (e.g., GPT-3.5) undoubtedly possess much more semantic knowledge. Therefore, how to effectively organize and exploit the abundant knowledge for KBQG becomes the focus of our study. In this work, we propose SGSH--a simple and effective framework to Stimulate GPT-3.5 with Skeleton Heuristics to enhance KBQG. The framework incorporates "skeleton heuristics", which provides more fine-grained guidance associated with each input to stimulate LLMs to generate optimal questions, encompassing essential elements like the question phrase and the auxiliary verb.More specifically, we devise an automatic data construction strategy leveraging ChatGPT to construct a skeleton training dataset, based on which we employ a soft prompting approach to train a BART model dedicated to generating the skeleton associated with each input. Subsequently, skeleton heuristics are encoded into the prompt to incentivize GPT-3.5 to generate desired questions. Extensive experiments demonstrate that SGSH derives the new state-of-the-art performance on the KBQG tasks.

* Accepted by NAACL 2024 Findings

Via

Access Paper or Ask Questions

A Survey on Neural Question Generation: Methods, Applications, and Prospects

Feb 28, 2024

Shasha Guo, Lizi Liao, Cuiping Li, Tat-Seng Chua

Figure 1 for A Survey on Neural Question Generation: Methods, Applications, and Prospects

Figure 2 for A Survey on Neural Question Generation: Methods, Applications, and Prospects

Abstract:In this survey, we present a detailed examination of the advancements in Neural Question Generation (NQG), a field leveraging neural network techniques to generate relevant questions from diverse inputs like knowledge bases, texts, and images. The survey begins with an overview of NQG's background, encompassing the task's problem formulation, prevalent benchmark datasets, established evaluation metrics, and notable applications. It then methodically classifies NQG approaches into three predominant categories: structured NQG, which utilizes organized data sources, unstructured NQG, focusing on more loosely structured inputs like texts or visual content, and hybrid NQG, drawing on diverse input modalities. This classification is followed by an in-depth analysis of the distinct neural network models tailored for each category, discussing their inherent strengths and potential limitations. The survey culminates with a forward-looking perspective on the trajectory of NQG, identifying emergent research trends and prospective developmental paths. Accompanying this survey is a curated collection of related research papers, datasets and codes, systematically organized on Github, providing an extensive reference for those delving into NQG.

Via

Access Paper or Ask Questions

Diversifying Question Generation over Knowledge Base via External Natural Questions

Sep 23, 2023

Shasha Guo, Jing Zhang, Xirui Ke, Cuiping Li, Hong Chen

Figure 1 for Diversifying Question Generation over Knowledge Base via External Natural Questions

Figure 2 for Diversifying Question Generation over Knowledge Base via External Natural Questions

Figure 3 for Diversifying Question Generation over Knowledge Base via External Natural Questions

Figure 4 for Diversifying Question Generation over Knowledge Base via External Natural Questions

Abstract:Previous methods on knowledge base question generation (KBQG) primarily focus on enhancing the quality of a single generated question. Recognizing the remarkable paraphrasing ability of humans, we contend that diverse texts should convey the same semantics through varied expressions. The above insights make diversifying question generation an intriguing task, where the first challenge is evaluation metrics for diversity. Current metrics inadequately assess the above diversity since they calculate the ratio of unique n-grams in the generated question itself, which leans more towards measuring duplication rather than true diversity. Accordingly, we devise a new diversity evaluation metric, which measures the diversity among top-k generated questions for each instance while ensuring their relevance to the ground truth. Clearly, the second challenge is how to enhance diversifying question generation. To address this challenge, we introduce a dual model framework interwoven by two selection strategies to generate diverse questions leveraging external natural questions. The main idea of our dual framework is to extract more diverse expressions and integrate them into the generation model to enhance diversifying question generation. Extensive experiments on widely used benchmarks for KBQG demonstrate that our proposed approach generates highly diverse questions and improves the performance of question answering tasks.

* 12 pages, 2 figures

Via

Access Paper or Ask Questions

SeqXFilter: A Memory-efficient Denoising Filter for Dynamic Vision Sensors

Jun 02, 2020

Shasha Guo, Lei Wang, Xiaofan Chen, Limeng Zhang, Ziyang Kang, Weixia Xu

Figure 1 for SeqXFilter: A Memory-efficient Denoising Filter for Dynamic Vision Sensors

Figure 2 for SeqXFilter: A Memory-efficient Denoising Filter for Dynamic Vision Sensors

Figure 3 for SeqXFilter: A Memory-efficient Denoising Filter for Dynamic Vision Sensors

Figure 4 for SeqXFilter: A Memory-efficient Denoising Filter for Dynamic Vision Sensors

Abstract:Neuromorphic event-based dynamic vision sensors (DVS) have much faster sampling rates and a higher dynamic range than frame-based imaging sensors. However, they are sensitive to background activity (BA) events that are unwanted. There are some filters for tackling this problem based on spatio-temporal correlation. However, they are either memory-intensive or computing-intensive. We propose \emph{SeqXFilter}, a spatio-temporal correlation filter with only a past event window that has an O(1) space complexity and has simple computations. We explore the spatial correlation of an event with its past few events by analyzing the distribution of the events when applying different functions on the spatial distances. We find the best function to check the spatio-temporal correlation for an event for \emph{SeqXFilter}, best separating real events and noise events. We not only give the visual denoising effect of the filter but also use two metrics for quantitatively analyzing the filter's performance. Four neuromorphic event-based datasets, recorded from four DVS with different output sizes, are used for validation of our method. The experimental results show that \emph{SeqXFilter} achieves similar performance as baseline NNb filters, but with extremely small memory cost and simple computation logic.

Via

Access Paper or Ask Questions

Exploration of Input Patterns for Enhancing the Performance of Liquid State Machines

Apr 06, 2020

Shasha Guo, Lianhua Qu, Lei Wang, Xulong Tang, Shuo Tian, Shiming Li, Weixia Xu

Figure 1 for Exploration of Input Patterns for Enhancing the Performance of Liquid State Machines

Figure 2 for Exploration of Input Patterns for Enhancing the Performance of Liquid State Machines

Figure 3 for Exploration of Input Patterns for Enhancing the Performance of Liquid State Machines

Figure 4 for Exploration of Input Patterns for Enhancing the Performance of Liquid State Machines

Abstract:Spiking Neural Networks (SNN) have gained increasing attention for its low power consumption. But training SNN is challenging. Liquid State Machine (LSM), as a major type of Reservoir computing, has been widely recognized for its low training cost among SNNs. The exploration of LSM topology for enhancing performance often requires hyper-parameter search, which is both resource-expensive and time-consuming. We explore the influence of input scale reduction on LSM instead. There are two main reasons for studying input reduction of LSM. One is that the input dimension of large images requires efficient processing. Another one is that input exploration is generally more economic than architecture search. To mitigate the difficulty in effectively dealing with huge input spaces of LSM, and to find that whether input reduction can enhance LSM performance, we explore several input patterns, namely fullscale, scanline, chessboard, and patch. Several datasets have been used to evaluate the performance of the proposed input patterns, including two spatio image datasets and one spatio-temporal image database. The experimental results show that the reduced input under chessboard pattern improves the accuracy by up to 5%, and reduces execution time by up to 50% with up to 75\% less input storage than the fullscale input pattern for LSM.

Via

Access Paper or Ask Questions