Picture for Hongning Wang

Hongning Wang

Agent-SafetyBench: Evaluating the Safety of LLM Agents

Add code
Dec 19, 2024
Viaarxiv icon

CharacterBench: Benchmarking Character Customization of Large Language Models

Add code
Dec 16, 2024
Figure 1 for CharacterBench: Benchmarking Character Customization of Large Language Models
Figure 2 for CharacterBench: Benchmarking Character Customization of Large Language Models
Figure 3 for CharacterBench: Benchmarking Character Customization of Large Language Models
Figure 4 for CharacterBench: Benchmarking Character Customization of Large Language Models
Viaarxiv icon

SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models

Add code
Dec 16, 2024
Viaarxiv icon

Does RLHF Scale? Exploring the Impacts From Data, Model, and Method

Add code
Dec 08, 2024
Figure 1 for Does RLHF Scale? Exploring the Impacts From Data, Model, and Method
Figure 2 for Does RLHF Scale? Exploring the Impacts From Data, Model, and Method
Figure 3 for Does RLHF Scale? Exploring the Impacts From Data, Model, and Method
Figure 4 for Does RLHF Scale? Exploring the Impacts From Data, Model, and Method
Viaarxiv icon

Unveiling User Satisfaction and Creator Productivity Trade-Offs in Recommendation Platforms

Add code
Oct 31, 2024
Viaarxiv icon

RecFlow: An Industrial Full Flow Recommendation Dataset

Add code
Oct 28, 2024
Figure 1 for RecFlow: An Industrial Full Flow Recommendation Dataset
Figure 2 for RecFlow: An Industrial Full Flow Recommendation Dataset
Figure 3 for RecFlow: An Industrial Full Flow Recommendation Dataset
Figure 4 for RecFlow: An Industrial Full Flow Recommendation Dataset
Viaarxiv icon

Data Selection via Optimal Control for Language Models

Add code
Oct 09, 2024
Figure 1 for Data Selection via Optimal Control for Language Models
Figure 2 for Data Selection via Optimal Control for Language Models
Figure 3 for Data Selection via Optimal Control for Language Models
Figure 4 for Data Selection via Optimal Control for Language Models
Viaarxiv icon

LogicGame: Benchmarking Rule-Based Reasoning Abilities of Large Language Models

Add code
Sep 05, 2024
Figure 1 for LogicGame: Benchmarking Rule-Based Reasoning Abilities of Large Language Models
Figure 2 for LogicGame: Benchmarking Rule-Based Reasoning Abilities of Large Language Models
Figure 3 for LogicGame: Benchmarking Rule-Based Reasoning Abilities of Large Language Models
Figure 4 for LogicGame: Benchmarking Rule-Based Reasoning Abilities of Large Language Models
Viaarxiv icon

Benchmarking Complex Instruction-Following with Multiple Constraints Composition

Add code
Jul 04, 2024
Viaarxiv icon

Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks

Add code
Jul 03, 2024
Viaarxiv icon