Abstract:Drawing parallels between human cognition and artificial intelligence, we explored how large language models (LLMs) internalize identities imposed by targeted prompts. Informed by Social Identity Theory, these identity assignments lead LLMs to distinguish between "we" (the ingroup) and "they" (the outgroup). This self-categorization generates both ingroup favoritism and outgroup bias. Nonetheless, existing literature has predominantly focused on ingroup favoritism, often overlooking outgroup bias, which is a fundamental source of intergroup prejudice and discrimination. Our experiment addresses this gap by demonstrating that outgroup bias manifests as strongly as ingroup favoritism. Furthermore, we successfully mitigated the inherent pro-liberal, anti-conservative bias in LLMs by guiding them to adopt the perspectives of the initially disfavored group. These results were replicated in the context of gender bias. Our findings highlight the potential to develop more equitable and balanced language models.
Abstract:We explored cultural biases-individualism vs. collectivism-in ChatGPT across three Western languages (i.e., English, German, and French) and three Eastern languages (i.e., Chinese, Japanese, and Korean). When ChatGPT adopted an individualistic persona in Western languages, its collectivism scores (i.e., out-group values) exhibited a more negative trend, surpassing their positive orientation towards individualism (i.e., in-group values). Conversely, when a collectivistic persona was assigned to ChatGPT in Eastern languages, a similar pattern emerged with more negative responses toward individualism (i.e., out-group values) as compared to collectivism (i.e., in-group values). The results indicate that when imbued with a particular social identity, ChatGPT discerns in-group and out-group, embracing in-group values while eschewing out-group values. Notably, the negativity towards the out-group, from which prejudices and discrimination arise, exceeded the positivity towards the in-group. The experiment was replicated in the political domain, and the results remained consistent. Furthermore, this replication unveiled an intrinsic Democratic bias in Large Language Models (LLMs), aligning with earlier findings and providing integral insights into mitigating such bias through prompt engineering. Extensive robustness checks were performed using varying hyperparameter and persona setup methods, with or without social identity labels, across other popular language models.
Abstract:We address an important gap in detection of political bias in news articles. Previous works that perform supervised document classification can be biased towards the writing style of each news outlet, leading to overfitting and limited generalizability. Our approach overcomes this limitation by considering both the sentence-level semantics and the document-level rhetorical structure, resulting in a more robust and style-agnostic approach to detecting political bias in news articles. We introduce a novel multi-head hierarchical attention model that effectively encodes the structure of long documents through a diverse ensemble of attention heads. While journalism follows a formalized rhetorical structure, the writing style may vary by news outlet. We demonstrate that our method overcomes this domain dependency and outperforms previous approaches for robustness and accuracy. Further analysis demonstrates the ability of our model to capture the discourse structures commonly used in the journalism domain.
Abstract:Quotes are critical for establishing credibility in news articles. A direct quote enclosed in quotation marks has a strong visual appeal and is a sign of a reliable citation. Unfortunately, this journalistic practice is not strictly followed, and a quote in the headline is often "contextomized." Such a quote uses words out of context in a way that alters the speaker's intention so that there is no semantically matching quote in the body text. We present QuoteCSE, a contrastive learning framework that represents the embedding of news quotes based on domain-driven positive and negative samples to identify such an editorial strategy. The dataset and code are available at https://github.com/ssu-humane/contextomized-quote-contrastive.