Picture for Tao Zhang

Tao Zhang

T-SCEND: Test-time Scalable MCTS-enhanced Diffusion Model

Add code
Feb 04, 2025
Viaarxiv icon

UNIP: Rethinking Pre-trained Attention Patterns for Infrared Semantic Segmentation

Add code
Feb 04, 2025
Viaarxiv icon

From Uncertain to Safe: Conformal Fine-Tuning of Diffusion Models for Safe PDE Control

Add code
Feb 04, 2025
Viaarxiv icon

Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization

Add code
Feb 03, 2025
Viaarxiv icon

Ocean-OCR: Towards General OCR Application via a Vision-Language Model

Add code
Jan 26, 2025
Figure 1 for Ocean-OCR: Towards General OCR Application via a Vision-Language Model
Figure 2 for Ocean-OCR: Towards General OCR Application via a Vision-Language Model
Figure 3 for Ocean-OCR: Towards General OCR Application via a Vision-Language Model
Figure 4 for Ocean-OCR: Towards General OCR Application via a Vision-Language Model
Viaarxiv icon

Baichuan-Omni-1.5 Technical Report

Add code
Jan 26, 2025
Viaarxiv icon

Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs

Add code
Jan 08, 2025
Viaarxiv icon

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Add code
Jan 07, 2025
Figure 1 for Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Figure 2 for Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Figure 3 for Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Figure 4 for Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Viaarxiv icon

Generative Regression Based Watch Time Prediction for Video Recommendation: Model and Performance

Add code
Dec 28, 2024
Figure 1 for Generative Regression Based Watch Time Prediction for Video Recommendation: Model and Performance
Figure 2 for Generative Regression Based Watch Time Prediction for Video Recommendation: Model and Performance
Figure 3 for Generative Regression Based Watch Time Prediction for Video Recommendation: Model and Performance
Figure 4 for Generative Regression Based Watch Time Prediction for Video Recommendation: Model and Performance
Viaarxiv icon

RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement

Add code
Dec 17, 2024
Viaarxiv icon