Picture for Jifeng Dai

Jifeng Dai

PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models

Add code
Dec 12, 2024
Viaarxiv icon

V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding

Add code
Dec 12, 2024
Viaarxiv icon

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

Add code
Dec 12, 2024
Viaarxiv icon

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Add code
Dec 06, 2024
Viaarxiv icon

HoloDrive: Holistic 2D-3D Multi-Modal Street Scene Generation for Autonomous Driving

Add code
Dec 03, 2024
Viaarxiv icon

MuLan: Adapting Multilingual Diffusion Models for Hundreds of Languages with Negligible Cost

Add code
Dec 02, 2024
Viaarxiv icon

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Add code
Nov 15, 2024
Figure 1 for Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Figure 2 for Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Figure 3 for Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Figure 4 for Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Viaarxiv icon

DI-MaskDINO: A Joint Object Detection and Instance Segmentation Model

Add code
Oct 22, 2024
Figure 1 for DI-MaskDINO: A Joint Object Detection and Instance Segmentation Model
Figure 2 for DI-MaskDINO: A Joint Object Detection and Instance Segmentation Model
Figure 3 for DI-MaskDINO: A Joint Object Detection and Instance Segmentation Model
Figure 4 for DI-MaskDINO: A Joint Object Detection and Instance Segmentation Model
Viaarxiv icon

Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance

Add code
Oct 21, 2024
Figure 1 for Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance
Figure 2 for Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance
Figure 3 for Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance
Figure 4 for Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance
Viaarxiv icon

Diffusion Transformer Policy

Add code
Oct 21, 2024
Figure 1 for Diffusion Transformer Policy
Figure 2 for Diffusion Transformer Policy
Figure 3 for Diffusion Transformer Policy
Figure 4 for Diffusion Transformer Policy
Viaarxiv icon