Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hassan Mansoor

VQA Training Sets are Self-play Environments for Generating Few-shot Pools

May 30, 2024

Tautvydas Misiunas, Hassan Mansoor, Jasper Uijlings, Oriana Riva, Victor Carbune

Figure 1 for VQA Training Sets are Self-play Environments for Generating Few-shot Pools

Figure 2 for VQA Training Sets are Self-play Environments for Generating Few-shot Pools

Figure 3 for VQA Training Sets are Self-play Environments for Generating Few-shot Pools

Figure 4 for VQA Training Sets are Self-play Environments for Generating Few-shot Pools

Abstract:Large-language models and large-vision models are increasingly capable of solving compositional reasoning tasks, as measured by breakthroughs in visual-question answering benchmarks. However, state-of-the-art solutions often involve careful construction of large pre-training and fine-tuning datasets, which can be expensive. The use of external tools, whether other ML models, search engines, or APIs, can significantly improve performance by breaking down high-level reasoning questions into sub-questions that are answerable by individual tools, but this approach has similar dataset construction costs to teach fine-tuned models how to use the available tools. We propose a technique in which existing training sets can be directly used for constructing computational environments with task metrics as rewards. This enables a model to autonomously teach itself to use itself or another model as a tool. By doing so, we augment training sets by integrating external signals. The proposed method starts with zero-shot prompts and iteratively refines them by selecting few-shot examples that maximize the task metric on the training set. Our experiments showcase how Gemini learns how to use itself, or another smaller and specialized model such as ScreenAI, to iteratively improve performance on training sets. Our approach successfully generalizes and improves upon zeroshot performance on charts, infographics, and document visual question-answering datasets

Via

Access Paper or Ask Questions

Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs

Mar 19, 2024

Victor Carbune, Hassan Mansoor, Fangyu Liu, Rahul Aralikatte, Gilles Baechler, Jindong Chen, Abhanshu Sharma

Figure 1 for Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs

Figure 2 for Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs

Figure 3 for Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs

Figure 4 for Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs

Abstract:Vision-language models (VLMs) are achieving increasingly strong performance on multimodal tasks. However, reasoning capabilities remain limited particularly for smaller VLMs, while those of large-language models (LLMs) have seen numerous improvements. We propose a technique to transfer capabilities from LLMs to VLMs. On the recently introduced ChartQA, our method obtains state-of-the-art performance when applied on the PaLI3-5B VLM by \citet{chen2023pali3}, while also enabling much better performance on PlotQA and FigureQA. We first improve the chart representation by continuing the pre-training stage using an improved version of the chart-to-table translation task by \citet{liu2023deplot}. We then propose constructing a 20x larger dataset than the original training set. To improve general reasoning capabilities and improve numerical operations, we synthesize reasoning traces using the table representation of charts. Lastly, our model is fine-tuned using the multitask loss introduced by \citet{hsieh2023distilling}. Our variant ChartPaLI-5B outperforms even 10x larger models such as PaLIX-55B without using an upstream OCR system, while keeping inference time constant compared to the PaLI3-5B baseline. When rationales are further refined with a simple program-of-thought prompt \cite{chen2023program}, our model outperforms the recently introduced Gemini Ultra and GPT-4V.

* Findings of NAACL 2024

Via

Access Paper or Ask Questions

PERL: Parameter Efficient Reinforcement Learning from Human Feedback

Mar 15, 2024

Hakim Sidahmed, Samrat Phatale, Alex Hutcheson, Zhuonan Lin, Zhang Chen, Zac Yu, Jarvis Jin, Roman Komarytsia, Christiane Ahlheim, Yonghao Zhu(+9 more)

Figure 1 for PERL: Parameter Efficient Reinforcement Learning from Human Feedback

Figure 2 for PERL: Parameter Efficient Reinforcement Learning from Human Feedback

Figure 3 for PERL: Parameter Efficient Reinforcement Learning from Human Feedback

Figure 4 for PERL: Parameter Efficient Reinforcement Learning from Human Feedback

Abstract:Reinforcement Learning from Human Feedback (RLHF) has proven to be a strong method to align Pretrained Large Language Models (LLMs) with human preferences. But training models with RLHF is computationally expensive, and an overall complex process. In this work, we study RLHF where the underlying models are trained using the parameter efficient method of Low-Rank Adaptation (LoRA) introduced by Hu et al. [2021]. We investigate the setup of "Parameter Efficient Reinforcement Learning" (PERL), in which we perform reward model training and reinforcement learning using LoRA. We compare PERL to conventional fine-tuning (full-tuning) across various configurations for 7 benchmarks, including 2 novel datasets, of reward modeling and reinforcement learning. We find that PERL performs on par with the conventional RLHF setting, while training faster, and with less memory. This enables the high performance of RLHF, while reducing the computational burden that limits its adoption as an alignment technique for Large Language Models. We also release 2 novel thumbs up/down preference datasets: "Taskmaster Coffee", and "Taskmaster Ticketing" to promote research around RLHF.

Via

Access Paper or Ask Questions

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Feb 19, 2024

Gilles Baechler, Srinivas Sunkara, Maria Wang, Fedir Zubach, Hassan Mansoor, Vincent Etter, Victor Cărbune, Jason Lin, Jindong Chen, Abhanshu Sharma

Figure 1 for ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Figure 2 for ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Figure 3 for ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Figure 4 for ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Abstract:Screen user interfaces (UIs) and infographics, sharing similar visual language and design principles, play important roles in human communication and human-machine interaction. We introduce ScreenAI, a vision-language model that specializes in UI and infographics understanding. Our model improves upon the PaLI architecture with the flexible patching strategy of pix2struct and is trained on a unique mixture of datasets. At the heart of this mixture is a novel screen annotation task in which the model has to identify the type and location of UI elements. We use these text annotations to describe screens to Large Language Models and automatically generate question-answering (QA), UI navigation, and summarization training datasets at scale. We run ablation studies to demonstrate the impact of these design choices. At only 5B parameters, ScreenAI achieves new state-of-the-artresults on UI- and infographics-based tasks (Multi-page DocVQA, WebSRC, MoTIF and Widget Captioning), and new best-in-class performance on others (Chart QA, DocVQA, and InfographicVQA) compared to models of similar size. Finally, we release three new datasets: one focused on the screen annotation task and two others focused on question answering.

* Revision notes: 1) In Appendix I, added dataset location for ScreenQA Short in Appendix I. 2) In Table 4, updated evaluation numbers for Screen Annotation and Complex Screen QA benchmarks as the datasets are updated. 3) Updated Figure 4 to reflect the changes in evaluation numbers described in 2). 4) Minor revisions in other places

Via

Access Paper or Ask Questions

LLMs cannot find reasoning errors, but can correct them!

Nov 14, 2023

Gladys Tyen, Hassan Mansoor, Peter Chen, Tony Mak, Victor Cărbune

Figure 1 for LLMs cannot find reasoning errors, but can correct them!

Figure 2 for LLMs cannot find reasoning errors, but can correct them!

Figure 3 for LLMs cannot find reasoning errors, but can correct them!

Figure 4 for LLMs cannot find reasoning errors, but can correct them!

Abstract:While self-correction has shown promise in improving LLM outputs in terms of style and quality (e.g. Chen et al., 2023; Madaan et al., 2023), recent attempts to self-correct logical or reasoning errors often cause correct answers to become incorrect, resulting in worse performances overall (Huang et al., 2023). In this paper, we break down the self-correction process into two core components: mistake finding and output correction. For mistake finding, we release BIG-Bench Mistake, a dataset of logical mistakes in Chain-of-Thought reasoning traces. We provide benchmark numbers for several state-of-the-art LLMs, and demonstrate that LLMs generally struggle with finding logical mistakes. For output correction, we propose a backtracking method which provides large improvements when given information on mistake location. We construe backtracking as a lightweight alternative to reinforcement learning methods, and show that it remains effective with a reward model at 60-70% accuracy.

Via

Access Paper or Ask Questions

The Impact of Preference Agreement in Reinforcement Learning from Human Feedback: A Case Study in Summarization

Nov 02, 2023

Sian Gooding, Hassan Mansoor

Figure 1 for The Impact of Preference Agreement in Reinforcement Learning from Human Feedback: A Case Study in Summarization

Figure 2 for The Impact of Preference Agreement in Reinforcement Learning from Human Feedback: A Case Study in Summarization

Figure 3 for The Impact of Preference Agreement in Reinforcement Learning from Human Feedback: A Case Study in Summarization

Figure 4 for The Impact of Preference Agreement in Reinforcement Learning from Human Feedback: A Case Study in Summarization

Abstract:Reinforcement Learning from Human Feedback (RLHF) can be used to capture complex and nuanced properties of text generation quality. As a result, the task of text summarization has been identified as a good candidate for this process. In this paper, we explore how preference agreement impacts the efficacy of RLHF for summarization. We show that sampling human preferences to include a range of annotator agreement results in (1) higher accuracy reward models and (2) alters the characteristics of quality captured. We additionally show improvements in downstream generation when using a reward model trained with a range of preference agreements. Our contributions have implications for the design of synthetic datasets as well as the importance of considering quality differentials in comparison-based data.

Via

Access Paper or Ask Questions

RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

Sep 01, 2023

Harrison Lee, Samrat Phatale, Hassan Mansoor, Kellie Lu, Thomas Mesnard, Colton Bishop, Victor Carbune, Abhinav Rastogi

Figure 1 for RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

Figure 2 for RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

Figure 3 for RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

Figure 4 for RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

Abstract:Reinforcement learning from human feedback (RLHF) is effective at aligning large language models (LLMs) to human preferences, but gathering high quality human preference labels is a key bottleneck. We conduct a head-to-head comparison of RLHF vs. RL from AI Feedback (RLAIF) - a technique where preferences are labeled by an off-the-shelf LLM in lieu of humans, and we find that they result in similar improvements. On the task of summarization, human evaluators prefer generations from both RLAIF and RLHF over a baseline supervised fine-tuned model in ~70% of cases. Furthermore, when asked to rate RLAIF vs. RLHF summaries, humans prefer both at equal rates. These results suggest that RLAIF can yield human-level performance, offering a potential solution to the scalability limitations of RLHF.

Via

Access Paper or Ask Questions