Picture for Yuxin Wen

Yuxin Wen

KEPO: Knowledge-Enhanced Preference Optimization for Reinforcement Learning with Reasoning

Add code
Jan 30, 2026
Viaarxiv icon

HY-Motion 1.0: Scaling Flow Matching Models for Text-To-Motion Generation

Add code
Dec 29, 2025
Viaarxiv icon

RL Is a Hammer and LLMs Are Nails: A Simple Reinforcement Learning Recipe for Strong Prompt Injection

Add code
Oct 06, 2025
Viaarxiv icon

Quantifying Cross-Modality Memorization in Vision-Language Models

Add code
Jun 05, 2025
Figure 1 for Quantifying Cross-Modality Memorization in Vision-Language Models
Figure 2 for Quantifying Cross-Modality Memorization in Vision-Language Models
Figure 3 for Quantifying Cross-Modality Memorization in Vision-Language Models
Figure 4 for Quantifying Cross-Modality Memorization in Vision-Language Models
Viaarxiv icon

A Fictional Q&A Dataset for Studying Memorization and Knowledge Acquisition

Add code
Jun 05, 2025
Viaarxiv icon

Analysis of Attention in Video Diffusion Transformers

Add code
Apr 14, 2025
Viaarxiv icon

Stable-SCore: A Stable Registration-based Framework for 3D Shape Correspondence

Add code
Mar 27, 2025
Figure 1 for Stable-SCore: A Stable Registration-based Framework for 3D Shape Correspondence
Figure 2 for Stable-SCore: A Stable Registration-based Framework for 3D Shape Correspondence
Figure 3 for Stable-SCore: A Stable Registration-based Framework for 3D Shape Correspondence
Figure 4 for Stable-SCore: A Stable Registration-based Framework for 3D Shape Correspondence
Viaarxiv icon

Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers

Add code
Feb 12, 2025
Figure 1 for Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers
Figure 2 for Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers
Figure 3 for Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers
Figure 4 for Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers
Viaarxiv icon

EditScout: Locating Forged Regions from Diffusion-based Edited Images with Multimodal LLM

Add code
Dec 05, 2024
Figure 1 for EditScout: Locating Forged Regions from Diffusion-based Edited Images with Multimodal LLM
Figure 2 for EditScout: Locating Forged Regions from Diffusion-based Edited Images with Multimodal LLM
Figure 3 for EditScout: Locating Forged Regions from Diffusion-based Edited Images with Multimodal LLM
Figure 4 for EditScout: Locating Forged Regions from Diffusion-based Edited Images with Multimodal LLM
Viaarxiv icon

Efficient Vision-Language Models by Summarizing Visual Tokens into Compact Registers

Add code
Oct 17, 2024
Figure 1 for Efficient Vision-Language Models by Summarizing Visual Tokens into Compact Registers
Figure 2 for Efficient Vision-Language Models by Summarizing Visual Tokens into Compact Registers
Figure 3 for Efficient Vision-Language Models by Summarizing Visual Tokens into Compact Registers
Figure 4 for Efficient Vision-Language Models by Summarizing Visual Tokens into Compact Registers
Viaarxiv icon