Picture for Tao Zhang

Tao Zhang

Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs

Add code
Jan 08, 2025
Viaarxiv icon

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Add code
Jan 07, 2025
Figure 1 for Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Figure 2 for Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Figure 3 for Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Figure 4 for Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Viaarxiv icon

Generative Regression Based Watch Time Prediction for Video Recommendation: Model and Performance

Add code
Dec 28, 2024
Figure 1 for Generative Regression Based Watch Time Prediction for Video Recommendation: Model and Performance
Figure 2 for Generative Regression Based Watch Time Prediction for Video Recommendation: Model and Performance
Figure 3 for Generative Regression Based Watch Time Prediction for Video Recommendation: Model and Performance
Figure 4 for Generative Regression Based Watch Time Prediction for Video Recommendation: Model and Performance
Viaarxiv icon

RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement

Add code
Dec 17, 2024
Viaarxiv icon

THESAURUS: Contrastive Graph Clustering by Swapping Fused Gromov-Wasserstein Couplings

Add code
Dec 16, 2024
Figure 1 for THESAURUS: Contrastive Graph Clustering by Swapping Fused Gromov-Wasserstein Couplings
Figure 2 for THESAURUS: Contrastive Graph Clustering by Swapping Fused Gromov-Wasserstein Couplings
Figure 3 for THESAURUS: Contrastive Graph Clustering by Swapping Fused Gromov-Wasserstein Couplings
Figure 4 for THESAURUS: Contrastive Graph Clustering by Swapping Fused Gromov-Wasserstein Couplings
Viaarxiv icon

Wavelet Diffusion Neural Operator

Add code
Dec 06, 2024
Viaarxiv icon

Compositional Generative Multiphysics and Multi-component Simulation

Add code
Dec 05, 2024
Viaarxiv icon

Detection of Performance Interference Among Network Slices in 5G/6G Systems

Add code
Dec 02, 2024
Figure 1 for Detection of Performance Interference Among Network Slices in 5G/6G Systems
Figure 2 for Detection of Performance Interference Among Network Slices in 5G/6G Systems
Figure 3 for Detection of Performance Interference Among Network Slices in 5G/6G Systems
Figure 4 for Detection of Performance Interference Among Network Slices in 5G/6G Systems
Viaarxiv icon

Self-supervised Video Instance Segmentation Can Boost Geographic Entity Alignment in Historical Maps

Add code
Nov 26, 2024
Figure 1 for Self-supervised Video Instance Segmentation Can Boost Geographic Entity Alignment in Historical Maps
Figure 2 for Self-supervised Video Instance Segmentation Can Boost Geographic Entity Alignment in Historical Maps
Figure 3 for Self-supervised Video Instance Segmentation Can Boost Geographic Entity Alignment in Historical Maps
Figure 4 for Self-supervised Video Instance Segmentation Can Boost Geographic Entity Alignment in Historical Maps
Viaarxiv icon

mR$^2$AG: Multimodal Retrieval-Reflection-Augmented Generation for Knowledge-Based VQA

Add code
Nov 22, 2024
Figure 1 for mR$^2$AG: Multimodal Retrieval-Reflection-Augmented Generation for Knowledge-Based VQA
Figure 2 for mR$^2$AG: Multimodal Retrieval-Reflection-Augmented Generation for Knowledge-Based VQA
Figure 3 for mR$^2$AG: Multimodal Retrieval-Reflection-Augmented Generation for Knowledge-Based VQA
Figure 4 for mR$^2$AG: Multimodal Retrieval-Reflection-Augmented Generation for Knowledge-Based VQA
Viaarxiv icon