Picture for Ben He

Ben He

Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from Scratch

Add code
Feb 24, 2025
Viaarxiv icon

SAISA: Towards Multimodal Large Language Models with Both Training and Inference Efficiency

Add code
Feb 04, 2025
Viaarxiv icon

PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides

Add code
Jan 07, 2025
Viaarxiv icon

Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models

Add code
Jan 03, 2025
Figure 1 for Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models
Figure 2 for Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models
Figure 3 for Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models
Figure 4 for Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models
Viaarxiv icon

Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering

Add code
Nov 18, 2024
Figure 1 for Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Figure 2 for Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Figure 3 for Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Figure 4 for Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Viaarxiv icon

DiffLM: Controllable Synthetic Data Generation via Diffusion Language Models

Add code
Nov 05, 2024
Figure 1 for DiffLM: Controllable Synthetic Data Generation via Diffusion Language Models
Figure 2 for DiffLM: Controllable Synthetic Data Generation via Diffusion Language Models
Figure 3 for DiffLM: Controllable Synthetic Data Generation via Diffusion Language Models
Figure 4 for DiffLM: Controllable Synthetic Data Generation via Diffusion Language Models
Viaarxiv icon

Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree?

Add code
Oct 08, 2024
Viaarxiv icon

CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution

Add code
Aug 23, 2024
Figure 1 for CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution
Figure 2 for CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution
Figure 3 for CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution
Figure 4 for CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution
Viaarxiv icon

On-Policy Fine-grained Knowledge Feedback for Hallucination Mitigation

Add code
Jun 18, 2024
Figure 1 for On-Policy Fine-grained Knowledge Feedback for Hallucination Mitigation
Figure 2 for On-Policy Fine-grained Knowledge Feedback for Hallucination Mitigation
Figure 3 for On-Policy Fine-grained Knowledge Feedback for Hallucination Mitigation
Figure 4 for On-Policy Fine-grained Knowledge Feedback for Hallucination Mitigation
Viaarxiv icon

Navigating the Shadows: Unveiling Effective Disturbances for Modern AI Content Detectors

Add code
Jun 13, 2024
Viaarxiv icon