Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ke Bai

Everyone Deserves A Reward: Learning Customized Human Preferences

Sep 15, 2023

Pengyu Cheng, Jiawen Xie, Ke Bai, Yong Dai, Nan Du

Abstract:Reward models (RMs) are essential for aligning large language models (LLMs) with human preferences to improve interaction quality. However, the real world is pluralistic, which leads to diversified human preferences with respect to different religions, politics, cultures, etc. Moreover, each individual can have their unique preferences on various topics. Neglecting the diversity of human preferences, current human feedback aligning methods only consider a general reward model, which is below satisfaction for customized or personalized application scenarios. To explore customized preference learning, we collect a domain-specific preference (DSP) dataset, which includes preferred responses for each given query from four practical domains. Besides, from the perspective of data efficiency, we propose a three-stage customized RM learning scheme, then empirically verify its effectiveness on both general preference datasets and our DSP set. Furthermore, we test multiple training and data strategies on the three learning stages. We find several ways to better preserve the general preferring ability while training the customized RMs, especially general preference enrichment, and customized preference imitation learning. The DSP dataset and code are available at https://github.com/Linear95/DSP.

Via

Access Paper or Ask Questions

Open World Classification with Adaptive Negative Samples

Mar 09, 2023

Ke Bai, Guoyin Wang, Jiwei Li, Sunghyun Park, Sungjin Lee, Puyang Xu, Ricardo Henao, Lawrence Carin

Figure 1 for Open World Classification with Adaptive Negative Samples

Figure 2 for Open World Classification with Adaptive Negative Samples

Figure 3 for Open World Classification with Adaptive Negative Samples

Figure 4 for Open World Classification with Adaptive Negative Samples

Abstract:Open world classification is a task in natural language processing with key practical relevance and impact. Since the open or {\em unknown} category data only manifests in the inference phase, finding a model with a suitable decision boundary accommodating for the identification of known classes and discrimination of the open category is challenging. The performance of existing models is limited by the lack of effective open category data during the training stage or the lack of a good mechanism to learn appropriate decision boundaries. We propose an approach based on \underline{a}daptive \underline{n}egative \underline{s}amples (ANS) designed to generate effective synthetic open category samples in the training stage and without requiring any prior knowledge or external datasets. Empirically, we find a significant advantage in using auxiliary one-versus-rest binary classifiers, which effectively utilize the generated negative samples and avoid the complex threshold-seeking stage in previous works. Extensive experiments on three benchmark datasets show that ANS achieves significant improvements over state-of-the-art methods.

* Accepted by EMNLP 2021 (Main Track, Long Paper)

Via

Access Paper or Ask Questions

Collaborative Anomaly Detection

Sep 20, 2022

Ke Bai, Aonan Zhang, Zhizhong Li, Ricardo Heano, Chong Wang, Lawrence Carin

Figure 1 for Collaborative Anomaly Detection

Figure 2 for Collaborative Anomaly Detection

Figure 3 for Collaborative Anomaly Detection

Figure 4 for Collaborative Anomaly Detection

Abstract:In recommendation systems, items are likely to be exposed to various users and we would like to learn about the familiarity of a new user with an existing item. This can be formulated as an anomaly detection (AD) problem distinguishing between "common users" (nominal) and "fresh users" (anomalous). Considering the sheer volume of items and the sparsity of user-item paired data, independently applying conventional single-task detection methods on each item quickly becomes difficult, while correlations between items are ignored. To address this multi-task anomaly detection problem, we propose collaborative anomaly detection (CAD) to jointly learn all tasks with an embedding encoding correlations among tasks. We explore CAD with conditional density estimation and conditional likelihood ratio estimation. We found that: $i$) estimating a likelihood ratio enjoys more efficient learning and yields better results than density estimation. $ii$) It is beneficial to select a small number of tasks in advance to learn a task embedding model, and then use it to warm-start all task embeddings. Consequently, these embeddings can capture correlations between tasks and generalize to new correlated tasks.

Via

Access Paper or Ask Questions

Variational Inference with Holder Bounds

Nov 13, 2021

Junya Chen, Danni Lu, Zidi Xiu, Ke Bai, Lawrence Carin, Chenyang Tao

Figure 1 for Variational Inference with Holder Bounds

Figure 2 for Variational Inference with Holder Bounds

Figure 3 for Variational Inference with Holder Bounds

Figure 4 for Variational Inference with Holder Bounds

Abstract:The recent introduction of thermodynamic integration techniques has provided a new framework for understanding and improving variational inference (VI). In this work, we present a careful analysis of the thermodynamic variational objective (TVO), bridging the gap between existing variational objectives and shedding new insights to advance the field. In particular, we elucidate how the TVO naturally connects the three key variational schemes, namely the importance-weighted VI, Renyi-VI, and MCMC-VI, which subsumes most VI objectives employed in practice. To explain the performance gap between theory and practice, we reveal how the pathological geometry of thermodynamic curves negatively affects TVO. By generalizing the integration path from the geometric mean to the weighted Holder mean, we extend the theory of TVO and identify new opportunities for improving VI. This motivates our new VI objectives, named the Holder bounds, which flatten the thermodynamic curves and promise to achieve a one-step approximation of the exact marginal log-likelihood. A comprehensive discussion on the choices of numerical estimators is provided. We present strong empirical evidence on both synthetic and real-world datasets to support our claims.

Via

Access Paper or Ask Questions

Weakly supervised cross-domain alignment with optimal transport

Aug 14, 2020

Siyang Yuan, Ke Bai, Liqun Chen, Yizhe Zhang, Chenyang Tao, Chunyuan Li, Guoyin Wang, Ricardo Henao, Lawrence Carin

Figure 1 for Weakly supervised cross-domain alignment with optimal transport

Figure 2 for Weakly supervised cross-domain alignment with optimal transport

Figure 3 for Weakly supervised cross-domain alignment with optimal transport

Figure 4 for Weakly supervised cross-domain alignment with optimal transport

Abstract:Cross-domain alignment between image objects and text sequences is key to many visual-language tasks, and it poses a fundamental challenge to both computer vision and natural language processing. This paper investigates a novel approach for the identification and optimization of fine-grained semantic similarities between image and text entities, under a weakly-supervised setup, improving performance over state-of-the-art solutions. Our method builds upon recent advances in optimal transport (OT) to resolve the cross-domain matching problem in a principled manner. Formulated as a drop-in regularizer, the proposed OT solution can be efficiently computed and used in combination with other existing approaches. We present empirical evidence to demonstrate the effectiveness of our approach, showing how it enables simpler model architectures to outperform or be comparable with more sophisticated designs on a range of vision-language tasks.

* Accepted to BMVC 2020 (Oral)

Via

Access Paper or Ask Questions

Learning Implicit Text Generation via Feature Matching

May 09, 2020

Inkit Padhi, Pierre Dognin, Ke Bai, Cicero Nogueira dos Santos, Vijil Chenthamarakshan, Youssef Mroueh, Payel Das

Figure 1 for Learning Implicit Text Generation via Feature Matching

Figure 2 for Learning Implicit Text Generation via Feature Matching

Figure 3 for Learning Implicit Text Generation via Feature Matching

Figure 4 for Learning Implicit Text Generation via Feature Matching

Abstract:Generative feature matching network (GFMN) is an approach for training implicit generative models for images by performing moment matching on features from pre-trained neural networks. In this paper, we present new GFMN formulations that are effective for sequential data. Our experimental results show the effectiveness of the proposed method, SeqGFMN, for three distinct generation tasks in English: unconditional text generation, class-conditional text generation, and unsupervised text style transfer. SeqGFMN is stable to train and outperforms various adversarial approaches for text generation and text style transfer.

* ACL 2020

Via

Access Paper or Ask Questions

Regularizing Reasons for Outfit Evaluation with Gradient Penalty

Feb 02, 2020

Xingxing Zou, Zhizhong Li, Ke Bai, Dahua Lin, Waikeung Wong

Figure 1 for Regularizing Reasons for Outfit Evaluation with Gradient Penalty

Figure 2 for Regularizing Reasons for Outfit Evaluation with Gradient Penalty

Figure 3 for Regularizing Reasons for Outfit Evaluation with Gradient Penalty

Figure 4 for Regularizing Reasons for Outfit Evaluation with Gradient Penalty

Abstract:In this paper, we build an outfit evaluation system which provides feedbacks consisting of a judgment with a convincing explanation. The system is trained in a supervised manner which faithfully follows the domain knowledge in fashion. We create the EVALUATION3 dataset which is annotated with judgment, the decisive reason for the judgment, and all corresponding attributes (e.g. print, silhouette, and material \etc.). In the training process, features of all attributes in an outfit are first extracted and then concatenated as the input for the intra-factor compatibility net. Then, the inter-factor compatibility net is used to compute the loss for judgment. We penalize the gradient of judgment loss of so that our Grad-CAM-like reason is regularized to be consistent with the labeled reason. In inference, according to the obtained information of judgment, reason, and attributes, a user-friendly explanation sentence is generated by the pre-defined templates. The experimental results show that the obtained network combines the advantages of high precision and good interpretation.

* 10 pages

Via

Access Paper or Ask Questions

GO Gradient for Expectation-Based Objectives

Jan 17, 2019

Yulai Cong, Miaoyun Zhao, Ke Bai, Lawrence Carin

Figure 1 for GO Gradient for Expectation-Based Objectives

Figure 2 for GO Gradient for Expectation-Based Objectives

Figure 3 for GO Gradient for Expectation-Based Objectives

Figure 4 for GO Gradient for Expectation-Based Objectives

Abstract:Within many machine learning algorithms, a fundamental problem concerns efficient calculation of an unbiased gradient wrt parameters $\gammav$ for expectation-based objectives $\Ebb_{q_{\gammav} (\yv)} [f(\yv)]$. Most existing methods either (i) suffer from high variance, seeking help from (often) complicated variance-reduction techniques; or (ii) they only apply to reparameterizable continuous random variables and employ a reparameterization trick. To address these limitations, we propose a General and One-sample (GO) gradient that (i) applies to many distributions associated with non-reparameterizable continuous or discrete random variables, and (ii) has the same low-variance as the reparameterization trick. We find that the GO gradient often works well in practice based on only one Monte Carlo sample (although one can of course use more samples if desired). Alongside the GO gradient, we develop a means of propagating the chain rule through distributions, yielding statistical back-propagation, coupling neural networks to common random variables.

Via

Access Paper or Ask Questions

Adversarial Learning of a Sampler Based on an Unnormalized Distribution

Jan 03, 2019

Chunyuan Li, Ke Bai, Jianqiao Li, Guoyin Wang, Changyou Chen, Lawrence Carin

Figure 1 for Adversarial Learning of a Sampler Based on an Unnormalized Distribution

Figure 2 for Adversarial Learning of a Sampler Based on an Unnormalized Distribution

Figure 3 for Adversarial Learning of a Sampler Based on an Unnormalized Distribution

Figure 4 for Adversarial Learning of a Sampler Based on an Unnormalized Distribution

Abstract:We investigate adversarial learning in the case when only an unnormalized form of the density can be accessed, rather than samples. With insights so garnered, adversarial learning is extended to the case for which one has access to an unnormalized form u(x) of the target density function, but no samples. Further, new concepts in GAN regularization are developed, based on learning from samples or from u(x). The proposed method is compared to alternative approaches, with encouraging results demonstrated across a range of applications, including deep soft Q-learning.

* Published in AISTATS 2019; Code: https://github.com/ChunyuanLI/RAS

Via

Access Paper or Ask Questions