Picture for Xiaofeng Zhang

Xiaofeng Zhang

PKRD-CoT: A Unified Chain-of-thought Prompting for Multi-Modal Large Language Models in Autonomous Driving

Add code
Dec 02, 2024
Viaarxiv icon

Seeing Clearly by Layer Two: Enhancing Attention Heads to Alleviate Hallucination in LVLMs

Add code
Nov 15, 2024
Figure 1 for Seeing Clearly by Layer Two: Enhancing Attention Heads to Alleviate Hallucination in LVLMs
Figure 2 for Seeing Clearly by Layer Two: Enhancing Attention Heads to Alleviate Hallucination in LVLMs
Figure 3 for Seeing Clearly by Layer Two: Enhancing Attention Heads to Alleviate Hallucination in LVLMs
Figure 4 for Seeing Clearly by Layer Two: Enhancing Attention Heads to Alleviate Hallucination in LVLMs
Viaarxiv icon

DDFAV: Remote Sensing Large Vision Language Models Dataset and Evaluation Benchmark

Add code
Nov 05, 2024
Viaarxiv icon

High-Fidelity Document Stain Removal via A Large-Scale Real-World Dataset and A Memory-Augmented Transformer

Add code
Oct 30, 2024
Viaarxiv icon

GiVE: Guiding Visual Encoder to Perceive Overlooked Information

Add code
Oct 26, 2024
Viaarxiv icon

Bridging the Gap between Text, Audio, Image, and Any Sequence: A Novel Approach using Gloss-based Annotation

Add code
Oct 04, 2024
Viaarxiv icon

Instance-adaptive Zero-shot Chain-of-Thought Prompting

Add code
Sep 30, 2024
Viaarxiv icon

DOPRA: Decoding Over-accumulation Penalization and Re-allocation in Specific Weighting Layer

Add code
Jul 23, 2024
Viaarxiv icon

From Redundancy to Relevance: Enhancing Explainability in Multimodal Large Language Models

Add code
Jun 04, 2024
Viaarxiv icon

Wavelet-Decoupling Contrastive Enhancement Network for Fine-Grained Skeleton-Based Action Recognition

Add code
Feb 03, 2024
Viaarxiv icon