Picture for Yingxiang Yang

Yingxiang Yang

DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs

Add code
Nov 20, 2024
Viaarxiv icon

Reward-Augmented Data Enhances Direct Preference Alignment of LLMs

Add code
Oct 10, 2024
Figure 1 for Reward-Augmented Data Enhances Direct Preference Alignment of LLMs
Figure 2 for Reward-Augmented Data Enhances Direct Preference Alignment of LLMs
Figure 3 for Reward-Augmented Data Enhances Direct Preference Alignment of LLMs
Figure 4 for Reward-Augmented Data Enhances Direct Preference Alignment of LLMs
Viaarxiv icon

Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer

Add code
May 26, 2024
Figure 1 for Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Figure 2 for Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Figure 3 for Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Figure 4 for Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Viaarxiv icon

$\mathbf{}$-Puzzle: A Cost-Efficient Testbed for Benchmarking Reinforcement Learning Algorithms in Generative Language Model

Add code
Mar 11, 2024
Figure 1 for $\mathbf{}$-Puzzle: A Cost-Efficient Testbed for Benchmarking Reinforcement Learning Algorithms in Generative Language Model
Figure 2 for $\mathbf{}$-Puzzle: A Cost-Efficient Testbed for Benchmarking Reinforcement Learning Algorithms in Generative Language Model
Figure 3 for $\mathbf{}$-Puzzle: A Cost-Efficient Testbed for Benchmarking Reinforcement Learning Algorithms in Generative Language Model
Figure 4 for $\mathbf{}$-Puzzle: A Cost-Efficient Testbed for Benchmarking Reinforcement Learning Algorithms in Generative Language Model
Viaarxiv icon

How Can LLM Guide RL? A Value-Based Approach

Add code
Feb 25, 2024
Viaarxiv icon

Reason out Your Layout: Evoking the Layout Master from Large Language Models for Text-to-Image Synthesis

Add code
Nov 28, 2023
Viaarxiv icon

Let Models Speak Ciphers: Multiagent Debate through Embeddings

Add code
Oct 10, 2023
Figure 1 for Let Models Speak Ciphers: Multiagent Debate through Embeddings
Figure 2 for Let Models Speak Ciphers: Multiagent Debate through Embeddings
Figure 3 for Let Models Speak Ciphers: Multiagent Debate through Embeddings
Figure 4 for Let Models Speak Ciphers: Multiagent Debate through Embeddings
Viaarxiv icon

Detecting Nonlinear Causality in Multivariate Time Series with Sparse Additive Models

Add code
Apr 26, 2018
Figure 1 for Detecting Nonlinear Causality in Multivariate Time Series with Sparse Additive Models
Figure 2 for Detecting Nonlinear Causality in Multivariate Time Series with Sparse Additive Models
Figure 3 for Detecting Nonlinear Causality in Multivariate Time Series with Sparse Additive Models
Figure 4 for Detecting Nonlinear Causality in Multivariate Time Series with Sparse Additive Models
Viaarxiv icon

Nonparametric Hawkes Processes: Online Estimation and Generalization Bounds

Add code
Jan 25, 2018
Figure 1 for Nonparametric Hawkes Processes: Online Estimation and Generalization Bounds
Figure 2 for Nonparametric Hawkes Processes: Online Estimation and Generalization Bounds
Figure 3 for Nonparametric Hawkes Processes: Online Estimation and Generalization Bounds
Figure 4 for Nonparametric Hawkes Processes: Online Estimation and Generalization Bounds
Viaarxiv icon

Efficient Neighborhood Selection for Gaussian Graphical Models

Add code
Sep 22, 2015
Figure 1 for Efficient Neighborhood Selection for Gaussian Graphical Models
Figure 2 for Efficient Neighborhood Selection for Gaussian Graphical Models
Figure 3 for Efficient Neighborhood Selection for Gaussian Graphical Models
Figure 4 for Efficient Neighborhood Selection for Gaussian Graphical Models
Viaarxiv icon