Picture for Tao Wu

Tao Wu

iFADIT: Invertible Face Anonymization via Disentangled Identity Transform

Add code
Jan 08, 2025
Viaarxiv icon

Online Video Understanding: A Comprehensive Benchmark and Memory-Augmented Method

Add code
Dec 31, 2024
Viaarxiv icon

Do Current Video LLMs Have Strong OCR Abilities? A Preliminary Study

Add code
Dec 29, 2024
Viaarxiv icon

VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models

Add code
Dec 27, 2024
Viaarxiv icon

NoisyEQA: Benchmarking Embodied Question Answering Against Noisy Queries

Add code
Dec 14, 2024
Figure 1 for NoisyEQA: Benchmarking Embodied Question Answering Against Noisy Queries
Figure 2 for NoisyEQA: Benchmarking Embodied Question Answering Against Noisy Queries
Figure 3 for NoisyEQA: Benchmarking Embodied Question Answering Against Noisy Queries
Figure 4 for NoisyEQA: Benchmarking Embodied Question Answering Against Noisy Queries
Viaarxiv icon

p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay

Add code
Dec 05, 2024
Figure 1 for p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay
Figure 2 for p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay
Figure 3 for p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay
Figure 4 for p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay
Viaarxiv icon

CamI2V: Camera-Controlled Image-to-Video Diffusion Model

Add code
Oct 21, 2024
Figure 1 for CamI2V: Camera-Controlled Image-to-Video Diffusion Model
Figure 2 for CamI2V: Camera-Controlled Image-to-Video Diffusion Model
Figure 3 for CamI2V: Camera-Controlled Image-to-Video Diffusion Model
Figure 4 for CamI2V: Camera-Controlled Image-to-Video Diffusion Model
Viaarxiv icon

Semantic Alignment for Multimodal Large Language Models

Add code
Aug 23, 2024
Figure 1 for Semantic Alignment for Multimodal Large Language Models
Figure 2 for Semantic Alignment for Multimodal Large Language Models
Figure 3 for Semantic Alignment for Multimodal Large Language Models
Figure 4 for Semantic Alignment for Multimodal Large Language Models
Viaarxiv icon

CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities

Add code
Aug 23, 2024
Figure 1 for CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities
Figure 2 for CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities
Figure 3 for CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities
Figure 4 for CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities
Viaarxiv icon

MacFormer: Semantic Segmentation with Fine Object Boundaries

Add code
Aug 11, 2024
Figure 1 for MacFormer: Semantic Segmentation with Fine Object Boundaries
Figure 2 for MacFormer: Semantic Segmentation with Fine Object Boundaries
Figure 3 for MacFormer: Semantic Segmentation with Fine Object Boundaries
Figure 4 for MacFormer: Semantic Segmentation with Fine Object Boundaries
Viaarxiv icon