Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Quan Du

RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data

Aug 22, 2024

Chenglong Wang, Yang Gan, Yifu Huo, Yongyu Mu, Murun Yang, Qiaozhi He, Tong Xiao, Chunliang Zhang, Tongran Liu, Quan Du(+2 more)

Figure 1 for RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data

Figure 2 for RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data

Figure 3 for RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data

Figure 4 for RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data

Abstract:Large vision-language models (LVLMs) often fail to align with human preferences, leading to issues like generating misleading content without proper visual context (also known as hallucination). A promising solution to this problem is using human-preference alignment techniques, such as best-of-n sampling and reinforcement learning. However, these techniques face the difficulty arising from the scarcity of visual preference data, which is required to train a visual reward model (VRM). In this work, we continue the line of research. We present a Robust Visual Reward Model (RoVRM) which improves human-preference alignment for LVLMs. RoVRM leverages auxiliary textual preference data through a three-phase progressive training and optimal transport-based preference data selection to effectively mitigate the scarcity of visual preference data. We experiment with RoVRM on the commonly used vision-language tasks based on the LLaVA-1.5-7B and -13B models. Experimental results demonstrate that RoVRM consistently outperforms traditional VRMs. Furthermore, our three-phase progressive training and preference data selection approaches can yield consistent performance gains over ranking-based alignment techniques, such as direct preference optimization.

Via

Access Paper or Ask Questions

Learning Evaluation Models from Large Language Models for Sequence Generation

Aug 08, 2023

Chenglong Wang, Hang Zhou, Kaiyan Chang, Tongran Liu, Chunliang Zhang, Quan Du, Tong Xiao, Jingbo Zhu

Abstract:Large language models achieve state-of-the-art performance on sequence generation evaluation, but typically have a large number of parameters. This is a computational challenge as presented by applying their evaluation capability at scale. To overcome the challenge, in this paper, we propose \textbf{ECT}, an \textbf{e}valuation \textbf{c}apability \textbf{t}ransfer method, to transfer the evaluation capability from LLMs to relatively lightweight language models. Based on the proposed ECT, we learn various evaluation models from ChatGPT, and employ them as reward models to improve sequence generation models via reinforcement learning and reranking approaches. Experimental results on machine translation, text style transfer, and summarization tasks demonstrate the effectiveness of our ECT. Notably, applying the learned evaluation models to sequence generation models results in better generated sequences as evaluated by commonly used metrics and ChatGPT.

Via

Access Paper or Ask Questions

ODE Transformer: An Ordinary Differential Equation-Inspired Model for Sequence Generation

Mar 17, 2022

Bei Li, Quan Du, Tao Zhou, Yi Jing, Shuhan Zhou, Xin Zeng, Tong Xiao, JingBo Zhu, Xuebo Liu, Min Zhang

Figure 1 for ODE Transformer: An Ordinary Differential Equation-Inspired Model for Sequence Generation

Figure 2 for ODE Transformer: An Ordinary Differential Equation-Inspired Model for Sequence Generation

Figure 3 for ODE Transformer: An Ordinary Differential Equation-Inspired Model for Sequence Generation

Figure 4 for ODE Transformer: An Ordinary Differential Equation-Inspired Model for Sequence Generation

Abstract:Residual networks are an Euler discretization of solutions to Ordinary Differential Equations (ODE). This paper explores a deeper relationship between Transformer and numerical ODE methods. We first show that a residual block of layers in Transformer can be described as a higher-order solution to ODE. Inspired by this, we design a new architecture, {\it ODE Transformer}, which is analogous to the Runge-Kutta method that is well motivated in ODE. As a natural extension to Transformer, ODE Transformer is easy to implement and efficient to use. Experimental results on the large-scale machine translation, abstractive summarization, and grammar error correction tasks demonstrate the high genericity of ODE Transformer. It can gain large improvements in model performance over strong baselines (e.g., 30.77 and 44.11 BLEU scores on the WMT'14 English-German and English-French benchmarks) at a slight cost in inference efficiency.

* Long paper accepted by ACL2022 main conference. arXiv admin note: substantial text overlap with arXiv:2104.02308

Via

Access Paper or Ask Questions

ODE Transformer: An Ordinary Differential Equation-Inspired Model for Neural Machine Translation

Apr 06, 2021

Bei Li, Quan Du, Tao Zhou, Shuhan Zhou, Xin Zeng, Tong Xiao, Jingbo Zhu

Figure 1 for ODE Transformer: An Ordinary Differential Equation-Inspired Model for Neural Machine Translation

Figure 2 for ODE Transformer: An Ordinary Differential Equation-Inspired Model for Neural Machine Translation

Figure 3 for ODE Transformer: An Ordinary Differential Equation-Inspired Model for Neural Machine Translation

Figure 4 for ODE Transformer: An Ordinary Differential Equation-Inspired Model for Neural Machine Translation

Abstract:It has been found that residual networks are an Euler discretization of solutions to Ordinary Differential Equations (ODEs). In this paper, we explore a deeper relationship between Transformer and numerical methods of ODEs. We show that a residual block of layers in Transformer can be described as a higher-order solution to ODEs. This leads us to design a new architecture (call it ODE Transformer) analogous to the Runge-Kutta method that is well motivated in ODEs. As a natural extension to Transformer, ODE Transformer is easy to implement and parameter efficient. Our experiments on three WMT tasks demonstrate the genericity of this model, and large improvements in performance over several strong baselines. It achieves 30.76 and 44.11 BLEU scores on the WMT'14 En-De and En-Fr test data. This sets a new state-of-the-art on the WMT'14 En-Fr task.

Via

Access Paper or Ask Questions

Learning Light-Weight Translation Models from Deep Transformer

Dec 27, 2020

Bei Li, Ziyang Wang, Hui Liu, Quan Du, Tong Xiao, Chunliang Zhang, Jingbo Zhu

Figure 1 for Learning Light-Weight Translation Models from Deep Transformer

Figure 2 for Learning Light-Weight Translation Models from Deep Transformer

Figure 3 for Learning Light-Weight Translation Models from Deep Transformer

Figure 4 for Learning Light-Weight Translation Models from Deep Transformer

Abstract:Recently, deep models have shown tremendous improvements in neural machine translation (NMT). However, systems of this kind are computationally expensive and memory intensive. In this paper, we take a natural step towards learning strong but light-weight NMT systems. We proposed a novel group-permutation based knowledge distillation approach to compressing the deep Transformer model into a shallow model. The experimental results on several benchmarks validate the effectiveness of our method. Our compressed model is 8X shallower than the deep model, with almost no loss in BLEU. To further enhance the teacher model, we present a Skipping Sub-Layer method to randomly omit sub-layers to introduce perturbation into training, which achieves a BLEU score of 30.63 on English-German newstest2014. The code is publicly available at https://github.com/libeineu/GPKD.

* Accepted by AAAI2021

Via

Access Paper or Ask Questions

A Simple and Effective Approach to Robust Unsupervised Bilingual Dictionary Induction

Nov 30, 2020

Yanyang Li, Yingfeng Luo, Ye Lin, Quan Du, Huizhen Wang, Shujian Huang, Tong Xiao, Jingbo Zhu

Figure 1 for A Simple and Effective Approach to Robust Unsupervised Bilingual Dictionary Induction

Figure 2 for A Simple and Effective Approach to Robust Unsupervised Bilingual Dictionary Induction

Figure 3 for A Simple and Effective Approach to Robust Unsupervised Bilingual Dictionary Induction

Figure 4 for A Simple and Effective Approach to Robust Unsupervised Bilingual Dictionary Induction

Abstract:Unsupervised Bilingual Dictionary Induction methods based on the initialization and the self-learning have achieved great success in similar language pairs, e.g., English-Spanish. But they still fail and have an accuracy of 0% in many distant language pairs, e.g., English-Japanese. In this work, we show that this failure results from the gap between the actual initialization performance and the minimum initialization performance for the self-learning to succeed. We propose Iterative Dimension Reduction to bridge this gap. Our experiments show that this simple method does not hamper the performance of similar language pairs and achieves an accuracy of 13.64~55.53% between English and four distant languages, i.e., Chinese, Japanese, Vietnamese and Thai.

* Accepted by COLING2020

Via

Access Paper or Ask Questions

Shallow-to-Deep Training for Neural Machine Translation

Oct 08, 2020

Bei Li, Ziyang Wang, Hui Liu, Yufan Jiang, Quan Du, Tong Xiao, Huizhen Wang, Jingbo Zhu

Figure 1 for Shallow-to-Deep Training for Neural Machine Translation

Figure 2 for Shallow-to-Deep Training for Neural Machine Translation

Figure 3 for Shallow-to-Deep Training for Neural Machine Translation

Figure 4 for Shallow-to-Deep Training for Neural Machine Translation

Abstract:Deep encoders have been proven to be effective in improving neural machine translation (NMT) systems, but training an extremely deep encoder is time consuming. Moreover, why deep models help NMT is an open question. In this paper, we investigate the behavior of a well-tuned deep Transformer system. We find that stacking layers is helpful in improving the representation ability of NMT models and adjacent layers perform similarly. This inspires us to develop a shallow-to-deep training method that learns deep models by stacking shallow models. In this way, we successfully train a Transformer system with a 54-layer encoder. Experimental results on WMT'16 English-German and WMT'14 English-French translation tasks show that it is $1.4$ $\times$ faster than training from scratch, and achieves a BLEU score of $30.33$ and $43.29$ on two tasks. The code is publicly available at https://github.com/libeineu/SDT-Training/.

* Accepted by EMNLP 2020

Via

Access Paper or Ask Questions

Weight Distillation: Transferring the Knowledge in Neural Network Parameters

Sep 19, 2020

Ye Lin, Yanyang Li, Ziyang Wang, Bei Li, Quan Du, Tong Xiao, Jingbo Zhu

Figure 1 for Weight Distillation: Transferring the Knowledge in Neural Network Parameters

Figure 2 for Weight Distillation: Transferring the Knowledge in Neural Network Parameters

Figure 3 for Weight Distillation: Transferring the Knowledge in Neural Network Parameters

Figure 4 for Weight Distillation: Transferring the Knowledge in Neural Network Parameters

Abstract:Knowledge distillation has been proven to be effective in model acceleration and compression. It allows a small network to learn to generalize in the same way as a large network. Recent successes in pre-training suggest the effectiveness of transferring model parameters. Inspired by this, we investigate methods of model acceleration and compression in another line of research. We propose Weight Distillation to transfer the knowledge in the large network parameters through a parameter generator. Our experiments on WMT16 En-Ro, NIST12 Zh-En, and WMT14 En-De machine translation tasks show that weight distillation can train a small network that is 1.88~2.94x faster than the large network but with competitive performance. With the same sized small network, weight distillation can outperform knowledge distillation by 0.51~1.82 BLEU points.

Via

Access Paper or Ask Questions