Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiaobo Guo

Benchmarking AI scientists in omics data-driven biological research

May 13, 2025

Erpai Luo, Jinmeng Jia, Yifan Xiong, Xiangyu Li, Xiaobo Guo, Baoqi Yu, Lei Wei, Xuegong Zhang

Abstract:The rise of large language models and multi-agent systems has sparked growing interest in AI scientists capable of autonomous biological research. However, existing benchmarks either focus on reasoning without data or on data analysis with predefined statistical answers, lacking realistic, data-driven evaluation settings. Here, we introduce the Biological AI Scientist Benchmark (BaisBench), a benchmark designed to assess AI scientists' ability to generate biological discoveries through data analysis and reasoning with external knowledge. BaisBench comprises two tasks: cell type annotation on 31 expert-labeled single-cell datasets, and scientific discovery through answering 198 multiple-choice questions derived from the biological insights of 41 recent single-cell studies. Systematic experiments on state-of-the-art AI scientists and LLM agents showed that while promising, current models still substantially underperform human experts on both tasks. We hope BaisBench will fill this gap and serve as a foundation for advancing and evaluating AI models for scientific discovery. The benchmark can be found at: https://github.com/EperLuo/BaisBench.

Via

Access Paper or Ask Questions

The Computational Anatomy of Humility: Modeling Intellectual Humility in Online Public Discourse

Oct 19, 2024

Xiaobo Guo, Neil Potnis, Melody Yu, Nabeel Gillani, Soroush Vosoughi

Figure 1 for The Computational Anatomy of Humility: Modeling Intellectual Humility in Online Public Discourse

Figure 2 for The Computational Anatomy of Humility: Modeling Intellectual Humility in Online Public Discourse

Figure 3 for The Computational Anatomy of Humility: Modeling Intellectual Humility in Online Public Discourse

Figure 4 for The Computational Anatomy of Humility: Modeling Intellectual Humility in Online Public Discourse

Abstract:The ability for individuals to constructively engage with one another across lines of difference is a critical feature of a healthy pluralistic society. This is also true in online discussion spaces like social media platforms. To date, much social media research has focused on preventing ills -- like political polarization and the spread of misinformation. While this is important, enhancing the quality of online public discourse requires not just reducing ills but also promoting foundational human virtues. In this study, we focus on one particular virtue: ``intellectual humility'' (IH), or acknowledging the potential limitations in one's own beliefs. Specifically, we explore the development of computational methods for measuring IH at scale. We manually curate and validate an IH codebook on 350 posts about religion drawn from subreddits and use them to develop LLM-based models for automating this measurement. Our best model achieves a Macro-F1 score of 0.64 across labels (and 0.70 when predicting IH/IA/Neutral at the coarse level), higher than an expected naive baseline of 0.51 (0.32 for IH/IA/Neutral) but lower than a human annotator-informed upper bound of 0.85 (0.83 for IH/IA/Neutral). Our results both highlight the challenging nature of detecting IH online -- opening the door to new directions in NLP research -- and also lay a foundation for computational social science researchers interested in analyzing and fostering more IH in online public discourse.

Via

Access Paper or Ask Questions

Enhanced Detection of Conversational Mental Manipulation Through Advanced Prompting Techniques

Aug 14, 2024

Ivory Yang, Xiaobo Guo, Sean Xie, Soroush Vosoughi

Abstract:This study presents a comprehensive, long-term project to explore the effectiveness of various prompting techniques in detecting dialogical mental manipulation. We implement Chain-of-Thought prompting with Zero-Shot and Few-Shot settings on a binary mental manipulation detection task, building upon existing work conducted with Zero-Shot and Few- Shot prompting. Our primary objective is to decipher why certain prompting techniques display superior performance, so as to craft a novel framework tailored for detection of mental manipulation. Preliminary findings suggest that advanced prompting techniques may not be suitable for more complex models, if they are not trained through example-based learning.

* Accepted at WiNLP @ EMNLP 2024

Via

Access Paper or Ask Questions

Serial Position Effects of Large Language Models

Jun 23, 2024

Xiaobo Guo, Soroush Vosoughi

Abstract:Large Language Models (LLMs) have shown remarkable capabilities in zero-shot learning applications, generating responses to queries using only pre-training information without the need for additional fine-tuning. This represents a significant departure from traditional machine learning approaches. Previous research has indicated that LLMs may exhibit serial position effects, such as primacy and recency biases, which are well-documented cognitive biases in human psychology. Our extensive testing across various tasks and models confirms the widespread occurrence of these effects, although their intensity varies. We also discovered that while carefully designed prompts can somewhat mitigate these biases, their effectiveness is inconsistent. These findings underscore the significance of serial position effects during the inference process, particularly in scenarios where there are no ground truth labels, highlighting the need for greater focus on addressing these effects in LLM applications.

Via

Access Paper or Ask Questions

MODABS: Multi-Objective Learning for Dynamic Aspect-Based Summarization

Jun 05, 2024

Xiaobo Guo, Soroush Vosoughi

Abstract:The rapid proliferation of online content necessitates effective summarization methods, among which dynamic aspect-based summarization stands out. Unlike its traditional counterpart, which assumes a fixed set of known aspects, this approach adapts to the varied aspects of the input text. We introduce a novel multi-objective learning framework employing a Longformer-Encoder-Decoder for this task. The framework optimizes aspect number prediction, minimizes disparity between generated and reference summaries for each aspect, and maximizes dissimilarity across aspect-specific summaries. Extensive experiments show our method significantly outperforms baselines on three diverse datasets, largely due to the effective alignment of generated and reference aspect counts without sacrificing single-aspect summarization quality.

Via

Access Paper or Ask Questions

LOLAMEME: Logic, Language, Memory, Mechanistic Framework

May 31, 2024

Jay Desai, Xiaobo Guo, Srinivasan H. Sengamedu

Figure 1 for LOLAMEME: Logic, Language, Memory, Mechanistic Framework

Figure 2 for LOLAMEME: Logic, Language, Memory, Mechanistic Framework

Figure 3 for LOLAMEME: Logic, Language, Memory, Mechanistic Framework

Figure 4 for LOLAMEME: Logic, Language, Memory, Mechanistic Framework

Abstract:The performance of Large Language Models has achieved superhuman breadth with unprecedented depth. At the same time, the language models are mostly black box models and the underlying mechanisms for performance have been evaluated using synthetic or mechanistic schemes. We extend current mechanistic schemes to incorporate Logic, memory, and nuances of Language such as latent structure. The proposed framework is called LOLAMEME and we provide two instantiations of LOLAMEME: LoLa and MeMe languages. We then consider two generative language model architectures: transformer-based GPT-2 and convolution-based Hyena. We propose the hybrid architecture T HEX and use LOLAMEME framework is used to compare three architectures. T HEX outperforms GPT-2 and Hyena on select tasks.

* https://openreview.net/pdf?id=73dhbcXxtV

Via

Access Paper or Ask Questions

JADS: A Framework for Self-supervised Joint Aspect Discovery and Summarization

May 28, 2024

Xiaobo Guo, Jay Desai, Srinivasan H. Sengamedu

Figure 1 for JADS: A Framework for Self-supervised Joint Aspect Discovery and Summarization

Figure 2 for JADS: A Framework for Self-supervised Joint Aspect Discovery and Summarization

Figure 3 for JADS: A Framework for Self-supervised Joint Aspect Discovery and Summarization

Figure 4 for JADS: A Framework for Self-supervised Joint Aspect Discovery and Summarization

Abstract:To generate summaries that include multiple aspects or topics for text documents, most approaches use clustering or topic modeling to group relevant sentences and then generate a summary for each group. These approaches struggle to optimize the summarization and clustering algorithms jointly. On the other hand, aspect-based summarization requires known aspects. Our solution integrates topic discovery and summarization into a single step. Given text data, our Joint Aspect Discovery and Summarization algorithm (JADS) discovers aspects from the input and generates a summary of the topics, in one step. We propose a self-supervised framework that creates a labeled dataset by first mixing sentences from multiple documents (e.g., CNN/DailyMail articles) as the input and then uses the article summaries from the mixture as the labels. The JADS model outperforms the two-step baselines. With pretraining, the model achieves better performance and stability. Furthermore, embeddings derived from JADS exhibit superior clustering capabilities. Our proposed method achieves higher semantic alignment with ground truth and is factual.

* preprint

Via

Access Paper or Ask Questions

Disordered-DABS: A Benchmark for Dynamic Aspect-Based Summarization in Disordered Texts

Feb 16, 2024

Xiaobo Guo, Soroush Vosoughi

Abstract:Aspect-based summarization has seen significant advancements, especially in structured text. Yet, summarizing disordered, large-scale texts, like those found in social media and customer feedback, remains a significant challenge. Current research largely targets predefined aspects within structured texts, neglecting the complexities of dynamic and disordered environments. Addressing this gap, we introduce Disordered-DABS, a novel benchmark for dynamic aspect-based summarization tailored to unstructured text. Developed by adapting existing datasets for cost-efficiency and scalability, our comprehensive experiments and detailed human evaluations reveal that Disordered-DABS poses unique challenges to contemporary summarization models, including state-of-the-art language models such as GPT-3.5.

Via

Access Paper or Ask Questions

Neural Node Matching for Multi-Target Cross Domain Recommendation

Feb 12, 2023

Wujiang Xu, Shaoshuai Li, Mingming Ha, Xiaobo Guo, Qiongxu Ma, Xiaolei Liu, Linxun Chen, Zhenfeng Zhu

Figure 1 for Neural Node Matching for Multi-Target Cross Domain Recommendation

Figure 2 for Neural Node Matching for Multi-Target Cross Domain Recommendation

Figure 3 for Neural Node Matching for Multi-Target Cross Domain Recommendation

Figure 4 for Neural Node Matching for Multi-Target Cross Domain Recommendation

Abstract:Multi-Target Cross Domain Recommendation(CDR) has attracted a surge of interest recently, which intends to improve the recommendation performance in multiple domains (or systems) simultaneously. Most existing multi-target CDR frameworks primarily rely on the existence of the majority of overlapped users across domains. However, general practical CDR scenarios cannot meet the strictly overlapping requirements and only share a small margin of common users across domains}. Additionally, the majority of users have quite a few historical behaviors in such small-overlapping CDR scenarios}. To tackle the aforementioned issues, we propose a simple-yet-effective neural node matching based framework for more general CDR settings, i.e., only (few) partially overlapped users exist across domains and most overlapped as well as non-overlapped users do have sparse interactions. The present framework} mainly contains two modules: (i) intra-to-inter node matching module, and (ii) intra node complementing module. Concretely, the first module conducts intra-knowledge fusion within each domain and subsequent inter-knowledge fusion across domains by fully connected user-user homogeneous graph information aggregating.

* The IEEE International Conference on Data Engineering 2023
* 13pages

Via

Access Paper or Ask Questions

Capturing Topic Framing via Masked Language Modeling

Feb 07, 2023

Xiaobo Guo, Weicheng Ma, Soroush Vosoughi

Abstract:Differential framing of issues can lead to divergent world views on important issues. This is especially true in domains where the information presented can reach a large audience, such as traditional and social media. Scalable and reliable measurement of such differential framing is an important first step in addressing them. In this work, based on the intuition that framing affects the tone and word choices in written language, we propose a framework for modeling the differential framing of issues through masked token prediction via large-scale fine-tuned language models (LMs). Specifically, we explore three key factors for our framework: 1) prompt generation methods for the masked token prediction; 2) methods for normalizing the output of fine-tuned LMs; 3) robustness to the choice of pre-trained LMs used for fine-tuning. Through experiments on a dataset of articles from traditional media outlets covering five diverse and politically polarized topics, we show that our framework can capture differential framing of these topics with high reliability.

* In Findings of the Association for Computational Linguistics: EMNLP 2022 (pp. 6811-6825) (2022, December)
* In Findings of EMNLP 2022

Via

Access Paper or Ask Questions