Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bill Ray

Language Models are Alignable Decision-Makers: Dataset and Application to the Medical Triage Domain

Jun 10, 2024

Brian Hu, Bill Ray, Alice Leung, Amy Summerville, David Joy, Christopher Funk, Arslan Basharat

Figure 1 for Language Models are Alignable Decision-Makers: Dataset and Application to the Medical Triage Domain

Figure 2 for Language Models are Alignable Decision-Makers: Dataset and Application to the Medical Triage Domain

Figure 3 for Language Models are Alignable Decision-Makers: Dataset and Application to the Medical Triage Domain

Figure 4 for Language Models are Alignable Decision-Makers: Dataset and Application to the Medical Triage Domain

Abstract:In difficult decision-making scenarios, it is common to have conflicting opinions among expert human decision-makers as there may not be a single right answer. Such decisions may be guided by different attributes that can be used to characterize an individual's decision. We introduce a novel dataset for medical triage decision-making, labeled with a set of decision-maker attributes (DMAs). This dataset consists of 62 scenarios, covering six different DMAs, including ethical principles such as fairness and moral desert. We present a novel software framework for human-aligned decision-making by utilizing these DMAs, paving the way for trustworthy AI with better guardrails. Specifically, we demonstrate how large language models (LLMs) can serve as ethical decision-makers, and how their decisions can be aligned to different DMAs using zero-shot prompting. Our experiments focus on different open-source models with varying sizes and training techniques, such as Falcon, Mistral, and Llama 2. Finally, we also introduce a new form of weighted self-consistency that improves the overall quantified performance. Our results provide new research directions in the use of LLMs as alignable decision-makers. The dataset and open-source software are publicly available at: https://github.com/ITM-Kitware/llm-alignable-dm.

* 15 pages total (including appendix), NAACL 2024 Industry Track

Via

Access Paper or Ask Questions

Exploring Document-Level Literary Machine Translation with Parallel Paragraphs from World Literature

Oct 25, 2022

Katherine Thai, Marzena Karpinska, Kalpesh Krishna, Bill Ray, Moira Inghilleri, John Wieting, Mohit Iyyer

Figure 1 for Exploring Document-Level Literary Machine Translation with Parallel Paragraphs from World Literature

Figure 2 for Exploring Document-Level Literary Machine Translation with Parallel Paragraphs from World Literature

Figure 3 for Exploring Document-Level Literary Machine Translation with Parallel Paragraphs from World Literature

Figure 4 for Exploring Document-Level Literary Machine Translation with Parallel Paragraphs from World Literature

Abstract:Literary translation is a culturally significant task, but it is bottlenecked by the small number of qualified literary translators relative to the many untranslated works published around the world. Machine translation (MT) holds potential to complement the work of human translators by improving both training procedures and their overall efficiency. Literary translation is less constrained than more traditional MT settings since translators must balance meaning equivalence, readability, and critical interpretability in the target language. This property, along with the complex discourse-level context present in literary texts, also makes literary MT more challenging to computationally model and evaluate. To explore this task, we collect a dataset (Par3) of non-English language novels in the public domain, each aligned at the paragraph level to both human and automatic English translations. Using Par3, we discover that expert literary translators prefer reference human translations over machine-translated paragraphs at a rate of 84%, while state-of-the-art automatic MT metrics do not correlate with those preferences. The experts note that MT outputs contain not only mistranslations, but also discourse-disrupting errors and stylistic inconsistencies. To address these problems, we train a post-editing model whose output is preferred over normal MT output at a rate of 69% by experts. We publicly release Par3 at https://github.com/katherinethai/par3/ to spur future research into literary MT.

* EMNLP 2022

Via

Access Paper or Ask Questions