Picture for Zequn Jie

Zequn Jie

VidCompress: Memory-Enhanced Temporal Compression for Video Understanding in Large Language Models

Add code
Oct 15, 2024
Viaarxiv icon

MRStyle: A Unified Framework for Color Style Transfer with Multi-Modality Reference

Add code
Sep 09, 2024
Figure 1 for MRStyle: A Unified Framework for Color Style Transfer with Multi-Modality Reference
Figure 2 for MRStyle: A Unified Framework for Color Style Transfer with Multi-Modality Reference
Figure 3 for MRStyle: A Unified Framework for Color Style Transfer with Multi-Modality Reference
Figure 4 for MRStyle: A Unified Framework for Color Style Transfer with Multi-Modality Reference
Viaarxiv icon

Making Large Language Models Better Planners with Reasoning-Decision Alignment

Add code
Aug 25, 2024
Viaarxiv icon

3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance

Add code
Jul 13, 2024
Viaarxiv icon

Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Cross-Regularization

Add code
Jul 11, 2024
Figure 1 for Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Cross-Regularization
Figure 2 for Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Cross-Regularization
Figure 3 for Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Cross-Regularization
Figure 4 for Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Cross-Regularization
Viaarxiv icon

OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion

Add code
Jul 10, 2024
Figure 1 for OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion
Figure 2 for OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion
Figure 3 for OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion
Figure 4 for OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion
Viaarxiv icon

MindBench: A Comprehensive Benchmark for Mind Map Structure Recognition and Analysis

Add code
Jul 03, 2024
Viaarxiv icon

Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models

Add code
Jun 12, 2024
Viaarxiv icon

AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning

Add code
Jun 01, 2024
Figure 1 for AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning
Figure 2 for AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning
Figure 3 for AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning
Figure 4 for AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning
Viaarxiv icon

Matten: Video Generation with Mamba-Attention

Add code
May 05, 2024
Viaarxiv icon