Picture for Yihe Deng

Yihe Deng

More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment

Add code
Apr 03, 2025
Viaarxiv icon

Entropy-Based Adaptive Weighting for Self-Training

Add code
Mar 31, 2025
Viaarxiv icon

OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement

Add code
Mar 21, 2025
Viaarxiv icon

DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails

Add code
Feb 07, 2025
Figure 1 for DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails
Figure 2 for DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails
Figure 3 for DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails
Figure 4 for DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails
Viaarxiv icon

Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning

Add code
Oct 29, 2024
Viaarxiv icon

MIRAI: Evaluating LLM Agents for Event Forecasting

Add code
Jul 01, 2024
Figure 1 for MIRAI: Evaluating LLM Agents for Event Forecasting
Figure 2 for MIRAI: Evaluating LLM Agents for Event Forecasting
Figure 3 for MIRAI: Evaluating LLM Agents for Event Forecasting
Figure 4 for MIRAI: Evaluating LLM Agents for Event Forecasting
Viaarxiv icon

Enhancing Large Vision Language Models with Self-Training on Image Comprehension

Add code
May 30, 2024
Viaarxiv icon

Mitigating Object Hallucination in Large Vision-Language Models via Classifier-Free Guidance

Add code
Feb 13, 2024
Viaarxiv icon

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

Add code
Jan 02, 2024
Figure 1 for Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Figure 2 for Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Figure 3 for Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Figure 4 for Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Viaarxiv icon

Risk Bounds of Accelerated SGD for Overparameterized Linear Regression

Add code
Nov 23, 2023
Viaarxiv icon