Picture for Yongzhe Chang

Yongzhe Chang

Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language Models

Add code
May 24, 2025
Viaarxiv icon

Generalizing Alignment Paradigm of Text-to-Image Generation with Preferences through $f$-divergence Minimization

Add code
Sep 15, 2024
Figure 1 for Generalizing Alignment Paradigm of Text-to-Image Generation with Preferences through $f$-divergence Minimization
Figure 2 for Generalizing Alignment Paradigm of Text-to-Image Generation with Preferences through $f$-divergence Minimization
Figure 3 for Generalizing Alignment Paradigm of Text-to-Image Generation with Preferences through $f$-divergence Minimization
Figure 4 for Generalizing Alignment Paradigm of Text-to-Image Generation with Preferences through $f$-divergence Minimization
Viaarxiv icon

Probing the Safety Response Boundary of Large Language Models via Unsafe Decoding Path Generation

Add code
Aug 21, 2024
Figure 1 for Probing the Safety Response Boundary of Large Language Models via Unsafe Decoding Path Generation
Figure 2 for Probing the Safety Response Boundary of Large Language Models via Unsafe Decoding Path Generation
Figure 3 for Probing the Safety Response Boundary of Large Language Models via Unsafe Decoding Path Generation
Figure 4 for Probing the Safety Response Boundary of Large Language Models via Unsafe Decoding Path Generation
Viaarxiv icon

QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning

Add code
Aug 20, 2024
Viaarxiv icon

DEER: A Delay-Resilient Framework for Reinforcement Learning with Variable Delays

Add code
Jun 05, 2024
Figure 1 for DEER: A Delay-Resilient Framework for Reinforcement Learning with Variable Delays
Figure 2 for DEER: A Delay-Resilient Framework for Reinforcement Learning with Variable Delays
Figure 3 for DEER: A Delay-Resilient Framework for Reinforcement Learning with Variable Delays
Figure 4 for DEER: A Delay-Resilient Framework for Reinforcement Learning with Variable Delays
Viaarxiv icon

A Method on Searching Better Activation Functions

Add code
May 22, 2024
Figure 1 for A Method on Searching Better Activation Functions
Figure 2 for A Method on Searching Better Activation Functions
Figure 3 for A Method on Searching Better Activation Functions
Figure 4 for A Method on Searching Better Activation Functions
Viaarxiv icon

Are Large Language Models Really Robust to Word-Level Perturbations?

Add code
Sep 27, 2023
Figure 1 for Are Large Language Models Really Robust to Word-Level Perturbations?
Figure 2 for Are Large Language Models Really Robust to Word-Level Perturbations?
Figure 3 for Are Large Language Models Really Robust to Word-Level Perturbations?
Figure 4 for Are Large Language Models Really Robust to Word-Level Perturbations?
Viaarxiv icon

SaFormer: A Conditional Sequence Modeling Approach to Offline Safe Reinforcement Learning

Add code
Jan 28, 2023
Viaarxiv icon

A Surrogate-Assisted Controller for Expensive Evolutionary Reinforcement Learning

Add code
Jan 01, 2022
Figure 1 for A Surrogate-Assisted Controller for Expensive Evolutionary Reinforcement Learning
Figure 2 for A Surrogate-Assisted Controller for Expensive Evolutionary Reinforcement Learning
Figure 3 for A Surrogate-Assisted Controller for Expensive Evolutionary Reinforcement Learning
Figure 4 for A Surrogate-Assisted Controller for Expensive Evolutionary Reinforcement Learning
Viaarxiv icon

Probability Density Estimation Based Imitation Learning

Add code
Dec 13, 2021
Figure 1 for Probability Density Estimation Based Imitation Learning
Figure 2 for Probability Density Estimation Based Imitation Learning
Figure 3 for Probability Density Estimation Based Imitation Learning
Figure 4 for Probability Density Estimation Based Imitation Learning
Viaarxiv icon