Picture for Zhiyuan Zhao

Zhiyuan Zhao

MelCap: A Unified Single-Codebook Neural Codec for High-Fidelity Audio Compression

Add code
Oct 02, 2025
Viaarxiv icon

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

Add code
Sep 26, 2025
Viaarxiv icon

PromptEnhancer: A Simple Approach to Enhance Text-to-Image Models via Chain-of-Thought Prompt Rewriting

Add code
Sep 04, 2025
Viaarxiv icon

Secure Tug-of-War (SecTOW): Iterative Defense-Attack Training with Reinforcement Learning for Multimodal Model Security

Add code
Jul 29, 2025
Viaarxiv icon

LLMs Caught in the Crossfire: Malware Requests and Jailbreak Challenges

Add code
Jun 09, 2025
Viaarxiv icon

WebUIBench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models in WebUI-to-Code

Add code
Jun 09, 2025
Viaarxiv icon

TimeRecipe: A Time-Series Forecasting Recipe via Benchmarking Module Level Effectiveness

Add code
Jun 06, 2025
Viaarxiv icon

Accelerating Visual Reinforcement Learning with Separate Primitive Policy for Peg-in-Hole Tasks

Add code
Apr 21, 2025
Viaarxiv icon

How Can Time Series Analysis Benefit From Multiple Modalities? A Survey and Outlook

Add code
Mar 14, 2025
Viaarxiv icon

From Captions to Rewards (CAREVL): Leveraging Large Language Model Experts for Enhanced Reward Modeling in Large Vision-Language Models

Add code
Mar 08, 2025
Viaarxiv icon