Picture for Jianlong Wu

Jianlong Wu

AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language Understanding

Add code
Mar 16, 2025
Viaarxiv icon

MegaSR: Mining Customized Semantics and Expressive Guidance for Image Super-Resolution

Add code
Mar 11, 2025
Viaarxiv icon

HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models

Add code
Feb 28, 2025
Viaarxiv icon

Continuous Knowledge-Preserving Decomposition for Few-Shot Continual Learning

Add code
Jan 09, 2025
Viaarxiv icon

LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition

Add code
Jan 08, 2025
Figure 1 for LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition
Figure 2 for LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition
Figure 3 for LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition
Figure 4 for LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition
Viaarxiv icon

ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding

Add code
Dec 29, 2024
Viaarxiv icon

Efficient Dataset Distillation via Diffusion-Driven Patch Selection for Improved Generalization

Add code
Dec 13, 2024
Figure 1 for Efficient Dataset Distillation via Diffusion-Driven Patch Selection for Improved Generalization
Figure 2 for Efficient Dataset Distillation via Diffusion-Driven Patch Selection for Improved Generalization
Figure 3 for Efficient Dataset Distillation via Diffusion-Driven Patch Selection for Improved Generalization
Figure 4 for Efficient Dataset Distillation via Diffusion-Driven Patch Selection for Improved Generalization
Viaarxiv icon

Preview-based Category Contrastive Learning for Knowledge Distillation

Add code
Oct 18, 2024
Viaarxiv icon

RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training

Add code
Oct 18, 2024
Figure 1 for RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training
Figure 2 for RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training
Figure 3 for RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training
Figure 4 for RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training
Viaarxiv icon

Video DataFlywheel: Resolving the Impossible Data Trinity in Video-Language Understanding

Add code
Sep 29, 2024
Figure 1 for Video DataFlywheel: Resolving the Impossible Data Trinity in Video-Language Understanding
Figure 2 for Video DataFlywheel: Resolving the Impossible Data Trinity in Video-Language Understanding
Figure 3 for Video DataFlywheel: Resolving the Impossible Data Trinity in Video-Language Understanding
Figure 4 for Video DataFlywheel: Resolving the Impossible Data Trinity in Video-Language Understanding
Viaarxiv icon