Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation

May 27, 2024

Chengxing Jia, Pengyuan Wang, Ziniu Li, Yi-Chen Li, Zhilong Zhang, Nan Tang, Yang Yu

Figure 1 for BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation

Figure 2 for BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation

Figure 3 for BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation

Figure 4 for BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation

Share this with someone who'll enjoy it:

Abstract:Large language models (LLMs) have catalyzed a paradigm shift in natural language processing, yet their limited controllability poses a significant challenge for downstream applications. We aim to address this by drawing inspiration from the neural mechanisms of the human brain, specifically Broca's and Wernicke's areas, which are crucial for language generation and comprehension, respectively. In particular, Broca's area receives cognitive decision signals from Wernicke's area, treating the language generation as an intricate decision-making process, which differs from the fully auto-regressive language generation of existing LLMs. In a similar vein, our proposed system, the BWArea model, conceptualizes language generation as a decision-making task. This model has three components: a language world model, an inverse dynamics model, and a cognitive policy. Like Wernicke's area, the inverse dynamics model is designed to deduce the underlying cognitive intentions, or latent actions, behind each token. The BWArea model is amenable to both pre-training and fine-tuning like existing LLMs. With 30B clean pre-training tokens, we have trained a BWArea model, which achieves competitive performance with LLMs of equal size (1B parameters). Unlike fully auto-regressive LLMs, its pre-training performance does not degenerate if dirty data unintentionally appears. This shows the advantage of a decomposed structure of BWArea model in reducing efforts in laborious data selection and labeling. Finally, we reveal that the BWArea model offers enhanced controllability via fine-tuning the cognitive policy with downstream reward metrics, thereby facilitating alignment with greater simplicity. On 9 out of 10 tasks from two suites, TextWorld and BigBench Hard, our method shows superior performance to auto-regressive LLMs.

View paper on

Share this with someone who'll enjoy it:

Title:BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation

Paper and Code