Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mark A. Hasegawa-Johnson

TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback

Jul 23, 2024

Eunseop Yoon, Hee Suk Yoon, SooHwan Eom, Gunsoo Han, Daniel Wontae Nam, Daejin Jo, Kyoung-Woon On, Mark A. Hasegawa-Johnson, Sungwoong Kim, Chang D. Yoo

Figure 1 for TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback

Figure 2 for TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback

Figure 3 for TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback

Figure 4 for TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback

Abstract:Reinforcement Learning from Human Feedback (RLHF) leverages human preference data to train language models to align more closely with human essence. These human preference data, however, are labeled at the sequence level, creating a mismatch between sequence-level preference labels and tokens, which are autoregressively generated from the language model. Although several recent approaches have tried to provide token-level (i.e., dense) rewards for each individual token, these typically rely on predefined discrete reward values (e.g., positive: +1, negative: -1, neutral: 0), failing to account for varying degrees of preference inherent to each token. To address this limitation, we introduce TLCR (Token-Level Continuous Reward) for RLHF, which incorporates a discriminator trained to distinguish positive and negative tokens, and the confidence of the discriminator is used to assign continuous rewards to each token considering the context. Extensive experiments show that our proposed TLCR leads to consistent performance improvements over previous sequence-level or token-level discrete rewards on open-ended generation benchmarks.

* ACL2024 Findings

Via

Access Paper or Ask Questions

Fast Generation for Convolutional Autoregressive Models

Apr 20, 2017

Prajit Ramachandran, Tom Le Paine, Pooya Khorrami, Mohammad Babaeizadeh, Shiyu Chang, Yang Zhang, Mark A. Hasegawa-Johnson, Roy H. Campbell, Thomas S. Huang

Figure 1 for Fast Generation for Convolutional Autoregressive Models

Figure 2 for Fast Generation for Convolutional Autoregressive Models

Figure 3 for Fast Generation for Convolutional Autoregressive Models

Figure 4 for Fast Generation for Convolutional Autoregressive Models

Abstract:Convolutional autoregressive models have recently demonstrated state-of-the-art performance on a number of generation tasks. While fast, parallel training methods have been crucial for their success, generation is typically implemented in a na\"{i}ve fashion where redundant computations are unnecessarily repeated. This results in slow generation, making such models infeasible for production environments. In this work, we describe a method to speed up generation in convolutional autoregressive models. The key idea is to cache hidden states to avoid redundant computation. We apply our fast generation method to the Wavenet and PixelCNN++ models and achieve up to $21\times$ and $183\times$ speedups respectively.

* Accepted at ICLR 2017 Workshop

Via

Access Paper or Ask Questions

Fast Wavenet Generation Algorithm

Nov 29, 2016

Tom Le Paine, Pooya Khorrami, Shiyu Chang, Yang Zhang, Prajit Ramachandran, Mark A. Hasegawa-Johnson, Thomas S. Huang

Figure 1 for Fast Wavenet Generation Algorithm

Figure 2 for Fast Wavenet Generation Algorithm

Figure 3 for Fast Wavenet Generation Algorithm

Figure 4 for Fast Wavenet Generation Algorithm

Abstract:This paper presents an efficient implementation of the Wavenet generation process called Fast Wavenet. Compared to a naive implementation that has complexity O(2^L) (L denotes the number of layers in the network), our proposed approach removes redundant convolution operations by caching previous calculations, thereby reducing the complexity to O(L) time. Timing experiments show significant advantages of our fast implementation over a naive one. While this method is presented for Wavenet, the same scheme can be applied anytime one wants to perform autoregressive generation or online prediction using a model with dilated convolution layers. The code for our method is publicly available.

* Technical Report

Via

Access Paper or Ask Questions

Streaming Recommender Systems

Jul 21, 2016

Shiyu Chang, Yang Zhang, Jiliang Tang, Dawei Yin, Yi Chang, Mark A. Hasegawa-Johnson, Thomas S. Huang

Figure 1 for Streaming Recommender Systems

Figure 2 for Streaming Recommender Systems

Figure 3 for Streaming Recommender Systems

Figure 4 for Streaming Recommender Systems

Abstract:The increasing popularity of real-world recommender systems produces data continuously and rapidly, and it becomes more realistic to study recommender systems under streaming scenarios. Data streams present distinct properties such as temporally ordered, continuous and high-velocity, which poses tremendous challenges to traditional recommender systems. In this paper, we investigate the problem of recommendation with stream inputs. In particular, we provide a principled framework termed sRec, which provides explicit continuous-time random process models of the creation of users and topics, and of the evolution of their interests. A variational Bayesian approach called recursive meanfield approximation is proposed, which permits computationally efficient instantaneous on-line inference. Experimental results on several real-world datasets demonstrate the advantages of our sRec over other state-of-the-arts.

Via

Access Paper or Ask Questions