Abstract:Large Language Models (LLMs) often struggle when prompted to generate content under specific constraints. However, in such cases it is often easy to check whether these constraints are satisfied or violated. Recent works have shown that LLMs can benefit from such ``corrective feedback''. Here we claim that this skill of LLMs can be significantly enhanced via training. We introduce an RL framework for teaching models to use such rewards, by simulating interaction sessions, and rewarding the model according to its ability to satisfy the constraints. We refer to our method as CORGI (Controlled Generation with RL for Guided Interaction), and evaluate it on a variety of controlled generation tasks using unlabeled training data. We find that CORGI consistently outperforms the baseline reinforcement learning method that does not incorporate conversational feedback. Furthermore, CORGI's interactive framework enables meta-learning, allowing the LLM to generalize better to guided interaction in new tasks. Our results clearly show that conversational optimization, when combined with reinforcement learning, significantly improves the effectiveness of LLMs in controlled generation contexts.
Abstract:Transfer learning for extremely low resource languages is a challenging task as there is no large scale monolingual corpora for pre training or sufficient annotated data for fine tuning. We follow the work of MetaXL which suggests using meta learning for transfer learning from a single source language to an extremely low resource one. We propose an enhanced approach which uses multiple source languages chosen in a data driven manner. In addition, we introduce a sample selection strategy for utilizing the languages in training by using a multi armed bandit algorithm. Using both of these improvements we managed to achieve state of the art results on the NER task for the extremely low resource languages while using the same amount of data, making the representations better generalized. Also, due to the method ability to use multiple languages it allows the framework to use much larger amounts of data, while still having superior results over the former MetaXL method even with the same amounts of data.
Abstract:Dual encoders are now the dominant architecture for dense retrieval. Yet, we have little understanding of how they represent text, and why this leads to good performance. In this work, we shed light on this question via distributions over the vocabulary. We propose to interpret the vector representations produced by dual encoders by projecting them into the model's vocabulary space. We show that the resulting distributions over vocabulary tokens are intuitive and contain rich semantic information. We find that this view can explain some of the failure cases of dense retrievers. For example, the inability of models to handle tail entities can be explained via a tendency of the token distributions to forget some of the tokens of those entities. We leverage this insight and propose a simple way to enrich query and passage representations with lexical information at inference time, and show that this significantly improves performance compared to the original model in out-of-domain settings.