Picture for Zhe Hu

Zhe Hu

Guided by the Plan: Enhancing Faithful Autoregressive Text-to-Audio Generation with Guided Decoding

Add code
Jan 18, 2026
Viaarxiv icon

Exploring Scale Shift in Crowd Localization under the Context of Domain Generalization

Add code
Oct 22, 2025
Viaarxiv icon

Language Model Based Text-to-Audio Generation: Anti-Causally Aligned Collaborative Residual Transformers

Add code
Oct 06, 2025
Viaarxiv icon

Physics-informed 4D X-ray image reconstruction from ultra-sparse spatiotemporal data

Add code
Apr 04, 2025
Figure 1 for Physics-informed 4D X-ray image reconstruction from ultra-sparse spatiotemporal data
Figure 2 for Physics-informed 4D X-ray image reconstruction from ultra-sparse spatiotemporal data
Figure 3 for Physics-informed 4D X-ray image reconstruction from ultra-sparse spatiotemporal data
Figure 4 for Physics-informed 4D X-ray image reconstruction from ultra-sparse spatiotemporal data
Viaarxiv icon

When 'YES' Meets 'BUT': Can Large Models Comprehend Contradictory Humor Through Comparative Reasoning?

Add code
Mar 29, 2025
Figure 1 for When 'YES' Meets 'BUT': Can Large Models Comprehend Contradictory Humor Through Comparative Reasoning?
Figure 2 for When 'YES' Meets 'BUT': Can Large Models Comprehend Contradictory Humor Through Comparative Reasoning?
Figure 3 for When 'YES' Meets 'BUT': Can Large Models Comprehend Contradictory Humor Through Comparative Reasoning?
Figure 4 for When 'YES' Meets 'BUT': Can Large Models Comprehend Contradictory Humor Through Comparative Reasoning?
Viaarxiv icon

Synchronized Video-to-Audio Generation via Mel Quantization-Continuum Decomposition

Add code
Mar 10, 2025
Viaarxiv icon

Language-Augmented Symbolic Planner for Open-World Task Planning

Add code
Jul 13, 2024
Figure 1 for Language-Augmented Symbolic Planner for Open-World Task Planning
Figure 2 for Language-Augmented Symbolic Planner for Open-World Task Planning
Figure 3 for Language-Augmented Symbolic Planner for Open-World Task Planning
Figure 4 for Language-Augmented Symbolic Planner for Open-World Task Planning
Viaarxiv icon

VIVA: A Benchmark for Vision-Grounded Decision-Making with Human Values

Add code
Jul 03, 2024
Figure 1 for VIVA: A Benchmark for Vision-Grounded Decision-Making with Human Values
Figure 2 for VIVA: A Benchmark for Vision-Grounded Decision-Making with Human Values
Figure 3 for VIVA: A Benchmark for Vision-Grounded Decision-Making with Human Values
Figure 4 for VIVA: A Benchmark for Vision-Grounded Decision-Making with Human Values
Viaarxiv icon

Unlocking Varied Perspectives: A Persona-Based Multi-Agent Framework with Debate-Driven Text Planning for Argument Generation

Add code
Jun 28, 2024
Figure 1 for Unlocking Varied Perspectives: A Persona-Based Multi-Agent Framework with Debate-Driven Text Planning for Argument Generation
Figure 2 for Unlocking Varied Perspectives: A Persona-Based Multi-Agent Framework with Debate-Driven Text Planning for Argument Generation
Figure 3 for Unlocking Varied Perspectives: A Persona-Based Multi-Agent Framework with Debate-Driven Text Planning for Argument Generation
Figure 4 for Unlocking Varied Perspectives: A Persona-Based Multi-Agent Framework with Debate-Driven Text Planning for Argument Generation
Viaarxiv icon

Cracking the Code of Juxtaposition: Can AI Models Understand the Humorous Contradictions

Add code
May 29, 2024
Figure 1 for Cracking the Code of Juxtaposition: Can AI Models Understand the Humorous Contradictions
Figure 2 for Cracking the Code of Juxtaposition: Can AI Models Understand the Humorous Contradictions
Figure 3 for Cracking the Code of Juxtaposition: Can AI Models Understand the Humorous Contradictions
Figure 4 for Cracking the Code of Juxtaposition: Can AI Models Understand the Humorous Contradictions
Viaarxiv icon