Picture for Hao Ma

Hao Ma

State Key Laboratory of Information Engineering in Survering, Mapping and Remote Sensing, Wuhan University

Vision-Based Generic Potential Function for Policy Alignment in Multi-Agent Reinforcement Learning

Add code
Feb 19, 2025
Viaarxiv icon

Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization

Add code
Jan 31, 2025
Viaarxiv icon

Enhancing Intelligibility for Generative Target Speech Extraction via Joint Optimization with Target Speaker ASR

Add code
Jan 24, 2025
Viaarxiv icon

Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback

Add code
Jan 18, 2025
Viaarxiv icon

Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment

Add code
Jan 16, 2025
Figure 1 for Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment
Figure 2 for Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment
Figure 3 for Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment
Figure 4 for Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment
Viaarxiv icon

Multi-IF: Benchmarking LLMs on Multi-Turn and Multilingual Instructions Following

Add code
Oct 21, 2024
Viaarxiv icon

Preference Optimization with Multi-Sample Comparisons

Add code
Oct 16, 2024
Viaarxiv icon

Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning

Add code
Oct 08, 2024
Figure 1 for Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning
Figure 2 for Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning
Figure 3 for Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning
Figure 4 for Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning
Viaarxiv icon

The Perfect Blend: Redefining RLHF with Mixture of Judges

Add code
Sep 30, 2024
Figure 1 for The Perfect Blend: Redefining RLHF with Mixture of Judges
Figure 2 for The Perfect Blend: Redefining RLHF with Mixture of Judges
Figure 3 for The Perfect Blend: Redefining RLHF with Mixture of Judges
Figure 4 for The Perfect Blend: Redefining RLHF with Mixture of Judges
Viaarxiv icon

Language-Queried Target Sound Extraction Without Parallel Training Data

Add code
Sep 14, 2024
Figure 1 for Language-Queried Target Sound Extraction Without Parallel Training Data
Figure 2 for Language-Queried Target Sound Extraction Without Parallel Training Data
Figure 3 for Language-Queried Target Sound Extraction Without Parallel Training Data
Figure 4 for Language-Queried Target Sound Extraction Without Parallel Training Data
Viaarxiv icon