Picture for Gang Zhang

Gang Zhang

Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models

Add code
Jan 03, 2025
Figure 1 for Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models
Figure 2 for Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models
Figure 3 for Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models
Figure 4 for Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models
Viaarxiv icon

Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception

Add code
Dec 18, 2024
Viaarxiv icon

ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts

Add code
Dec 11, 2024
Figure 1 for ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts
Figure 2 for ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts
Figure 3 for ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts
Figure 4 for ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts
Viaarxiv icon

Continual SFT Matches Multimodal RLHF with Negative Supervision

Add code
Nov 22, 2024
Figure 1 for Continual SFT Matches Multimodal RLHF with Negative Supervision
Figure 2 for Continual SFT Matches Multimodal RLHF with Negative Supervision
Figure 3 for Continual SFT Matches Multimodal RLHF with Negative Supervision
Figure 4 for Continual SFT Matches Multimodal RLHF with Negative Supervision
Viaarxiv icon

R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models

Add code
Oct 23, 2024
Viaarxiv icon

Improving Multi-modal Large Language Model through Boosting Vision Capabilities

Add code
Oct 17, 2024
Figure 1 for Improving Multi-modal Large Language Model through Boosting Vision Capabilities
Figure 2 for Improving Multi-modal Large Language Model through Boosting Vision Capabilities
Figure 3 for Improving Multi-modal Large Language Model through Boosting Vision Capabilities
Figure 4 for Improving Multi-modal Large Language Model through Boosting Vision Capabilities
Viaarxiv icon

Add-SD: Rational Generation without Manual Reference

Add code
Jul 30, 2024
Figure 1 for Add-SD: Rational Generation without Manual Reference
Figure 2 for Add-SD: Rational Generation without Manual Reference
Figure 3 for Add-SD: Rational Generation without Manual Reference
Figure 4 for Add-SD: Rational Generation without Manual Reference
Viaarxiv icon

LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction

Add code
Jul 16, 2024
Viaarxiv icon

OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer

Add code
Jul 15, 2024
Figure 1 for OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer
Figure 2 for OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer
Viaarxiv icon

LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection

Add code
Jun 05, 2024
Figure 1 for LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection
Figure 2 for LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection
Figure 3 for LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection
Figure 4 for LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection
Viaarxiv icon