Picture for Heng Wang

Heng Wang

Kimi-VL Technical Report

Add code
Apr 10, 2025
Viaarxiv icon

Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources

Add code
Apr 02, 2025
Viaarxiv icon

ROS-SAM: High-Quality Interactive Segmentation for Remote Sensing Moving Object

Add code
Mar 15, 2025
Viaarxiv icon

BannerAgency: Advertising Banner Design with Multimodal LLM Agents

Add code
Mar 14, 2025
Viaarxiv icon

Reward Shaping to Mitigate Reward Hacking in RLHF

Add code
Feb 26, 2025
Viaarxiv icon

Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

Add code
Feb 18, 2025
Viaarxiv icon

CoSER: Coordinating LLM-Based Persona Simulation of Established Roles

Add code
Feb 13, 2025
Viaarxiv icon

Cosmos World Foundation Model Platform for Physical AI

Add code
Jan 07, 2025
Figure 1 for Cosmos World Foundation Model Platform for Physical AI
Figure 2 for Cosmos World Foundation Model Platform for Physical AI
Figure 3 for Cosmos World Foundation Model Platform for Physical AI
Figure 4 for Cosmos World Foundation Model Platform for Physical AI
Viaarxiv icon

Fast Prompt Alignment for Text-to-Image Generation

Add code
Dec 11, 2024
Viaarxiv icon

Gotta Hear Them All: Sound Source Aware Vision to Audio Generation

Add code
Nov 26, 2024
Figure 1 for Gotta Hear Them All: Sound Source Aware Vision to Audio Generation
Figure 2 for Gotta Hear Them All: Sound Source Aware Vision to Audio Generation
Figure 3 for Gotta Hear Them All: Sound Source Aware Vision to Audio Generation
Figure 4 for Gotta Hear Them All: Sound Source Aware Vision to Audio Generation
Viaarxiv icon