Picture for Zhe Chen

Zhe Chen

PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models

Add code
Dec 12, 2024
Viaarxiv icon

Hierarchical Split Federated Learning: Convergence Analysis and System Optimization

Add code
Dec 10, 2024
Viaarxiv icon

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Add code
Dec 06, 2024
Viaarxiv icon

GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI

Add code
Nov 21, 2024
Viaarxiv icon

Smile upon the Face but Sadness in the Eyes: Emotion Recognition based on Facial Expressions and Eye Behaviors

Add code
Nov 19, 2024
Figure 1 for Smile upon the Face but Sadness in the Eyes: Emotion Recognition based on Facial Expressions and Eye Behaviors
Figure 2 for Smile upon the Face but Sadness in the Eyes: Emotion Recognition based on Facial Expressions and Eye Behaviors
Figure 3 for Smile upon the Face but Sadness in the Eyes: Emotion Recognition based on Facial Expressions and Eye Behaviors
Figure 4 for Smile upon the Face but Sadness in the Eyes: Emotion Recognition based on Facial Expressions and Eye Behaviors
Viaarxiv icon

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Add code
Nov 15, 2024
Figure 1 for Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Figure 2 for Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Figure 3 for Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Figure 4 for Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Viaarxiv icon

Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance

Add code
Oct 21, 2024
Figure 1 for Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance
Figure 2 for Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance
Figure 3 for Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance
Figure 4 for Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance
Viaarxiv icon

MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding

Add code
Oct 15, 2024
Figure 1 for MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding
Figure 2 for MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding
Figure 3 for MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding
Figure 4 for MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding
Viaarxiv icon

Tracing Human Stress from Physiological Signals using UWB Radar

Add code
Oct 14, 2024
Viaarxiv icon

t-READi: Transformer-Powered Robust and Efficient Multimodal Inference for Autonomous Driving

Add code
Oct 13, 2024
Viaarxiv icon