Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chung-Chi Chen

Refining Financial Consumer Complaints through Multi-Scale Model Interaction

Apr 14, 2025

Bo-Wei Chen, An-Zi Yen, Chung-Chi Chen

Abstract:Legal writing demands clarity, formality, and domain-specific precision-qualities often lacking in documents authored by individuals without legal training. To bridge this gap, this paper explores the task of legal text refinement that transforms informal, conversational inputs into persuasive legal arguments. We introduce FinDR, a Chinese dataset of financial dispute records, annotated with official judgments on claim reasonableness. Our proposed method, Multi-Scale Model Interaction (MSMI), leverages a lightweight classifier to evaluate outputs and guide iterative refinement by Large Language Models (LLMs). Experimental results demonstrate that MSMI significantly outperforms single-pass prompting strategies. Additionally, we validate the generalizability of MSMI on several short-text benchmarks, showing improved adversarial robustness. Our findings reveal the potential of multi-model collaboration for enhancing legal document generation and broader text refinement tasks.

Via

Access Paper or Ask Questions

The Impact and Feasibility of Self-Confidence Shaping for AI-Assisted Decision-Making

Feb 20, 2025

Takehiro Takayanagi, Ryuji Hashimoto, Chung-Chi Chen, Kiyoshi Izumi

Abstract:In AI-assisted decision-making, it is crucial but challenging for humans to appropriately rely on AI, especially in high-stakes domains such as finance and healthcare. This paper addresses this problem from a human-centered perspective by presenting an intervention for self-confidence shaping, designed to calibrate self-confidence at a targeted level. We first demonstrate the impact of self-confidence shaping by quantifying the upper-bound improvement in human-AI team performance. Our behavioral experiments with 121 participants show that self-confidence shaping can improve human-AI team performance by nearly 50% by mitigating both over- and under-reliance on AI. We then introduce a self-confidence prediction task to identify when our intervention is needed. Our results show that simple machine-learning models achieve 67% accuracy in predicting self-confidence. We further illustrate the feasibility of such interventions. The observed relationship between sentiment and self-confidence suggests that modifying sentiment could be a viable strategy for shaping self-confidence. Finally, we outline future research directions to support the deployment of self-confidence shaping in a real-world scenario for effective human-AI collaboration.

Via

Access Paper or Ask Questions

Observing Micromotives and Macrobehavior of Large Language Models

Dec 10, 2024

Yuyang Cheng, Xingwei Qu, Tomas Goldsack, Chenghua Lin, Chung-Chi Chen

Abstract:Thomas C. Schelling, awarded the 2005 Nobel Memorial Prize in Economic Sciences, pointed out that ``individuals decisions (micromotives), while often personal and localized, can lead to societal outcomes (macrobehavior) that are far more complex and different from what the individuals intended.'' The current research related to large language models' (LLMs') micromotives, such as preferences or biases, assumes that users will make more appropriate decisions once LLMs are devoid of preferences or biases. Consequently, a series of studies has focused on removing bias from LLMs. In the NLP community, while there are many discussions on LLMs' micromotives, previous studies have seldom conducted a systematic examination of how LLMs may influence society's macrobehavior. In this paper, we follow the design of Schelling's model of segregation to observe the relationship between the micromotives and macrobehavior of LLMs. Our results indicate that, regardless of the level of bias in LLMs, a highly segregated society will emerge as more people follow LLMs' suggestions. We hope our discussion will spark further consideration of the fundamental assumption regarding the mitigation of LLMs' micromotives and encourage a reevaluation of how LLMs may influence users and society.

Via

Access Paper or Ask Questions

Paraphrase-Aligned Machine Translation

Dec 08, 2024

Ke-Ching Chang, Chung-Chi Chen, An-Zi Yen

Abstract:Large Language Models (LLMs) have demonstrated significant capabilities in machine translation. However, their translation quality is sometimes questioned, as the generated outputs may deviate from expressions typically used by native speakers. These deviations often arise from differences in sentence structure between language systems. To address this issue, we propose ParaAlign Translator, a method that fine-tunes LLMs to paraphrase sentences, aligning their structures with those of the target language systems. This approach improves the performance of subsequent translations. Experimental results demonstrate that the proposed method enhances the LLaMA-3-8B model's performance in both resource-rich and low-resource scenarios and achieves parity with or surpassing the much larger LLaMA-3-70B model.

Via

Access Paper or Ask Questions

ML-Promise: A Multilingual Dataset for Corporate Promise Verification

Nov 07, 2024

Yohei Seki, Hakusen Shu, Anaïs Lhuissier, Hanwool Lee, Juyeon Kang, Min-Yuh Day, Chung-Chi Chen

Figure 1 for ML-Promise: A Multilingual Dataset for Corporate Promise Verification

Figure 2 for ML-Promise: A Multilingual Dataset for Corporate Promise Verification

Figure 3 for ML-Promise: A Multilingual Dataset for Corporate Promise Verification

Figure 4 for ML-Promise: A Multilingual Dataset for Corporate Promise Verification

Abstract:Promises made by politicians, corporate leaders, and public figures have a significant impact on public perception, trust, and institutional reputation. However, the complexity and volume of such commitments, coupled with difficulties in verifying their fulfillment, necessitate innovative methods for assessing their credibility. This paper introduces the concept of Promise Verification, a systematic approach involving steps such as promise identification, evidence assessment, and the evaluation of timing for verification. We propose the first multilingual dataset, ML-Promise, which includes English, French, Chinese, Japanese, and Korean, aimed at facilitating in-depth verification of promises, particularly in the context of Environmental, Social, and Governance (ESG) reports. Given the growing emphasis on corporate environmental contributions, this dataset addresses the challenge of evaluating corporate promises, especially in light of practices like greenwashing. Our findings also explore textual and image-based baselines, with promising results from retrieval-augmented generation (RAG) approaches. This work aims to foster further discourse on the accountability of public commitments across multiple languages and domains.

* 6 pages

Via

Access Paper or Ask Questions

Are Expert-Level Language Models Expert-Level Annotators?

Oct 04, 2024

Yu-Min Tseng, Wei-Lin Chen, Chung-Chi Chen, Hsin-Hsi Chen

Abstract:Data annotation refers to the labeling or tagging of textual data with relevant information. A large body of works have reported positive results on leveraging LLMs as an alternative to human annotators. However, existing studies focus on classic NLP tasks, and the extent to which LLMs as data annotators perform in domains requiring expert knowledge remains underexplored. In this work, we investigate comprehensive approaches across three highly specialized domains and discuss practical suggestions from a cost-effectiveness perspective. To the best of our knowledge, we present the first systematic evaluation of LLMs as expert-level data annotators.

* Accepted to WiML @ NeurIPS 2024 (extended version)

Via

Access Paper or Ask Questions

GADFA: Generator-Assisted Decision-Focused Approach for Opinion Expressing Timing Identification

Oct 02, 2024

Chung-Chi Chen, Hiroya Takamura, Ichiro Kobayashi, Yusuke Miyao

Figure 1 for GADFA: Generator-Assisted Decision-Focused Approach for Opinion Expressing Timing Identification

Figure 2 for GADFA: Generator-Assisted Decision-Focused Approach for Opinion Expressing Timing Identification

Figure 3 for GADFA: Generator-Assisted Decision-Focused Approach for Opinion Expressing Timing Identification

Figure 4 for GADFA: Generator-Assisted Decision-Focused Approach for Opinion Expressing Timing Identification

Abstract:The advancement of text generation models has granted us the capability to produce coherent and convincing text on demand. Yet, in real-life circumstances, individuals do not continuously generate text or voice their opinions. For instance, consumers pen product reviews after weighing the merits and demerits of a product, and professional analysts issue reports following significant news releases. In essence, opinion expression is typically prompted by particular reasons or signals. Despite long-standing developments in opinion mining, the appropriate timing for expressing an opinion remains largely unexplored. To address this deficit, our study introduces an innovative task - the identification of news-triggered opinion expressing timing. We ground this task in the actions of professional stock analysts and develop a novel dataset for investigation. Our approach is decision-focused, leveraging text generation models to steer the classification model, thus enhancing overall performance. Our experimental findings demonstrate that the text generated by our model contributes fresh insights from various angles, effectively aiding in identifying the optimal timing for opinion expression.

Via

Access Paper or Ask Questions

From Facts to Insights: A Study on the Generation and Evaluation of Analytical Reports for Deciphering Earnings Calls

Oct 01, 2024

Tomas Goldsack, Yang Wang, Chenghua Lin, Chung-Chi Chen

Abstract:This paper explores the use of Large Language Models (LLMs) in the generation and evaluation of analytical reports derived from Earnings Calls (ECs). Addressing a current gap in research, we explore the generation of analytical reports with LLMs in a multi-agent framework, designing specialized agents that introduce diverse viewpoints and desirable topics of analysis into the report generation process. Through multiple analyses, we examine the alignment between generated and human-written reports and the impact of both individual and collective agents. Our findings suggest that the introduction of additional agents results in more insightful reports, although reports generated by human experts remain preferred in the majority of cases. Finally, we address the challenging issue of report evaluation, we examine the limitations and strengths of LLMs in assessing the quality of generated reports in different settings, revealing a significant correlation with human experts across multiple dimensions.

* Pre-print

Via

Access Paper or Ask Questions

"Why" Has the Least Side Effect on Model Editing

Sep 27, 2024

Tsung-Hsuan Pan, Chung-Chi Chen, Hen-Hsen Huang, Hsin-Hsi Chen

Figure 1 for "Why" Has the Least Side Effect on Model Editing

Figure 2 for "Why" Has the Least Side Effect on Model Editing

Abstract:Training large language models (LLMs) from scratch is an expensive endeavor, particularly as world knowledge continually evolves. To maintain relevance and accuracy of LLMs, model editing has emerged as a pivotal research area. While these methods hold promise, they can also produce unintended side effects. Their underlying factors and causes remain largely unexplored. This paper delves into a critical factor-question type-by categorizing model editing questions. Our findings reveal that the extent of performance degradation varies significantly across different question types, providing new insights for experimental design in knowledge editing. Furthermore, we investigate whether insights from smaller models can be extrapolated to larger models. Our results indicate discrepancies in findings between models of different sizes, suggesting that insights from smaller models may not necessarily apply to larger models. Additionally, we examine the impact of batch size on side effects, discovering that increasing the batch size can mitigate performance drops.

Via

Access Paper or Ask Questions

Rehearsing Answers to Probable Questions with Perspective-Taking

Sep 27, 2024

Yung-Yu Shih, Ziwei Xu, Hiroya Takamura, Yun-Nung Chen, Chung-Chi Chen

Figure 1 for Rehearsing Answers to Probable Questions with Perspective-Taking

Figure 2 for Rehearsing Answers to Probable Questions with Perspective-Taking

Figure 3 for Rehearsing Answers to Probable Questions with Perspective-Taking

Figure 4 for Rehearsing Answers to Probable Questions with Perspective-Taking

Abstract:Question answering (QA) has been a long-standing focus in the NLP field, predominantly addressing reading comprehension and common sense QA. However, scenarios involving the preparation of answers to probable questions during professional oral presentations remain underexplored. In this paper, we pioneer the examination of this crucial yet overlooked topic by utilizing real-world QA conversation transcripts between company managers and professional analysts. We explore the proposed task using three causal knowledge graphs (KGs) and three large language models (LLMs). This work provides foundational insights into the application of LLMs in professional QA scenarios, highlighting the importance of causal KGs and perspective-taking in generating effective responses.

Via

Access Paper or Ask Questions