Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kaijian Zou

Retrieval or Global Context Understanding? On Many-Shot In-Context Learning for Long-Context Evaluation

Nov 11, 2024

Kaijian Zou, Muhammad Khalifa, Lu Wang

Figure 1 for Retrieval or Global Context Understanding? On Many-Shot In-Context Learning for Long-Context Evaluation

Figure 2 for Retrieval or Global Context Understanding? On Many-Shot In-Context Learning for Long-Context Evaluation

Figure 3 for Retrieval or Global Context Understanding? On Many-Shot In-Context Learning for Long-Context Evaluation

Figure 4 for Retrieval or Global Context Understanding? On Many-Shot In-Context Learning for Long-Context Evaluation

Abstract:Language models (LMs) have demonstrated an improved capacity to handle long-context information, yet existing long-context benchmarks primarily measure LMs' retrieval abilities with extended inputs, e.g., pinpointing a short phrase from long-form text. Therefore, they may fall short when evaluating models' global context understanding capacity, such as synthesizing and reasoning over content across input to generate the response. In this paper, we study long-context language model (LCLM) evaluation through many-shot in-context learning (ICL). Concretely, we identify the skills each ICL task requires, and examine models' long-context capabilities on them. We first ask: What types of ICL tasks benefit from additional demonstrations, and are these tasks effective at evaluating LCLMs? We find that classification and summarization tasks show notable performance improvements with additional demonstrations, while translation and reasoning tasks do not exhibit clear trends. This suggests the classification tasks predominantly test models' retrieval skills. Next, we ask: To what extent does each task require retrieval skills versus global context understanding from LCLMs? We develop metrics to categorize ICL tasks into two groups: (i) retrieval tasks that require strong retrieval ability to pinpoint relevant examples, and (ii) global context understanding tasks that necessitate a deeper comprehension of the full input. We find that not all datasets can effectively evaluate these long-context capabilities. To address this gap, we introduce a new many-shot ICL benchmark, MANYICLBENCH, designed to characterize LCLMs' retrieval and global context understanding capabilities separately. Benchmarking 11 open-weight LCLMs with MANYICLBENCH, we find that while state-of-the-art models perform well in retrieval tasks up to 64k tokens, many show significant drops in global context tasks at just 16k tokens.

Via

Access Paper or Ask Questions

All Things Considered: Detecting Partisan Events from News Media with Cross-Article Comparison

Oct 28, 2023

Yujian Liu, Xinliang Frederick Zhang, Kaijian Zou, Ruihong Huang, Nick Beauchamp, Lu Wang

Figure 1 for All Things Considered: Detecting Partisan Events from News Media with Cross-Article Comparison

Figure 2 for All Things Considered: Detecting Partisan Events from News Media with Cross-Article Comparison

Figure 3 for All Things Considered: Detecting Partisan Events from News Media with Cross-Article Comparison

Figure 4 for All Things Considered: Detecting Partisan Events from News Media with Cross-Article Comparison

Abstract:Public opinion is shaped by the information news media provide, and that information in turn may be shaped by the ideological preferences of media outlets. But while much attention has been devoted to media bias via overt ideological language or topic selection, a more unobtrusive way in which the media shape opinion is via the strategic inclusion or omission of partisan events that may support one side or the other. We develop a latent variable-based framework to predict the ideology of news articles by comparing multiple articles on the same story and identifying partisan events whose inclusion or omission reveals ideology. Our experiments first validate the existence of partisan event selection, and then show that article alignment and cross-document comparison detect partisan events and article ideology better than competitive baselines. Our results reveal the high-level form of media bias, which is present even among mainstream media with strong norms of objectivity and nonpartisanship. Our codebase and dataset are available at https://github.com/launchnlp/ATC.

* EMNLP'23 Main Conference

Via

Access Paper or Ask Questions

Crossing the Aisle: Unveiling Partisan and Counter-Partisan Events in News Reporting

Oct 28, 2023

Kaijian Zou, Xinliang Frederick Zhang, Winston Wu, Nick Beauchamp, Lu Wang

Figure 1 for Crossing the Aisle: Unveiling Partisan and Counter-Partisan Events in News Reporting

Figure 2 for Crossing the Aisle: Unveiling Partisan and Counter-Partisan Events in News Reporting

Figure 3 for Crossing the Aisle: Unveiling Partisan and Counter-Partisan Events in News Reporting

Figure 4 for Crossing the Aisle: Unveiling Partisan and Counter-Partisan Events in News Reporting

Abstract:News media is expected to uphold unbiased reporting. Yet they may still affect public opinion by selectively including or omitting events that support or contradict their ideological positions. Prior work in NLP has only studied media bias via linguistic style and word usage. In this paper, we study to which degree media balances news reporting and affects consumers through event inclusion or omission. We first introduce the task of detecting both partisan and counter-partisan events: events that support or oppose the author's political ideology. To conduct our study, we annotate a high-quality dataset, PAC, containing 8,511 (counter-)partisan event annotations in 304 news articles from ideologically diverse media outlets. We benchmark PAC to highlight the challenges of this task. Our findings highlight both the ways in which the news subtly shapes opinion and the need for large language models that better understand events within a broader context. Our dataset can be found at https://github.com/launchnlp/Partisan-Event-Dataset.

* EMNLP'23 Findings

Via

Access Paper or Ask Questions