Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Aravind Mysore

Programming by Examples Meets Historical Linguistics: A Large Language Model Based Approach to Sound Law Induction

Jan 27, 2025

Atharva Naik, Darsh Agrawal, Hong Sng, Clayton Marr, Kexun Zhang, Nathaniel R Robinson, Kalvin Chang, Rebecca Byrnes, Aravind Mysore, Carolyn Rose(+1 more)

Figure 1 for Programming by Examples Meets Historical Linguistics: A Large Language Model Based Approach to Sound Law Induction

Figure 2 for Programming by Examples Meets Historical Linguistics: A Large Language Model Based Approach to Sound Law Induction

Figure 3 for Programming by Examples Meets Historical Linguistics: A Large Language Model Based Approach to Sound Law Induction

Figure 4 for Programming by Examples Meets Historical Linguistics: A Large Language Model Based Approach to Sound Law Induction

Abstract:Historical linguists have long written "programs" that convert reconstructed words in an ancestor language into their attested descendants via ordered string rewrite functions (called sound laws) However, writing these programs is time-consuming, motivating the development of automated Sound Law Induction (SLI) which we formulate as Programming by Examples (PBE) with Large Language Models (LLMs) in this paper. While LLMs have been effective for code generation, recent work has shown that PBE is challenging but improvable by fine-tuning, especially with training data drawn from the same distribution as evaluation data. In this paper, we create a conceptual framework of what constitutes a "similar distribution" for SLI and propose four kinds of synthetic data generation methods with varying amounts of inductive bias to investigate what leads to the best performance. Based on the results we create a SOTA open-source model for SLI as PBE (+6% pass rate with a third of the parameters of the second-best LLM) and also highlight exciting future directions for PBE research.

Via

Access Paper or Ask Questions

Can Large Language Models Code Like a Linguist?: A Case Study in Low Resource Sound Law Induction

Jun 18, 2024

Atharva Naik, Kexun Zhang, Nathaniel Robinson, Aravind Mysore, Clayton Marr, Hong Sng Rebecca Byrnes, Anna Cai, Kalvin Chang, David Mortensen

Figure 1 for Can Large Language Models Code Like a Linguist?: A Case Study in Low Resource Sound Law Induction

Figure 2 for Can Large Language Models Code Like a Linguist?: A Case Study in Low Resource Sound Law Induction

Figure 3 for Can Large Language Models Code Like a Linguist?: A Case Study in Low Resource Sound Law Induction

Figure 4 for Can Large Language Models Code Like a Linguist?: A Case Study in Low Resource Sound Law Induction

Abstract:Historical linguists have long written a kind of incompletely formalized ''program'' that converts reconstructed words in an ancestor language into words in one of its attested descendants that consist of a series of ordered string rewrite functions (called sound laws). They do this by observing pairs of words in the reconstructed language (protoforms) and the descendent language (reflexes) and constructing a program that transforms protoforms into reflexes. However, writing these programs is error-prone and time-consuming. Prior work has successfully scaffolded this process computationally, but fewer researchers have tackled Sound Law Induction (SLI), which we approach in this paper by casting it as Programming by Examples. We propose a language-agnostic solution that utilizes the programming ability of Large Language Models (LLMs) by generating Python sound law programs from sound change examples. We evaluate the effectiveness of our approach for various LLMs, propose effective methods to generate additional language-agnostic synthetic data to fine-tune LLMs for SLI, and compare our method with existing automated SLI methods showing that while LLMs lag behind them they can complement some of their weaknesses.

Via

Access Paper or Ask Questions