Picture for Yaowei Wang

Yaowei Wang

Activating Associative Disease-Aware Vision Token Memory for LLM-Based X-ray Report Generation

Add code
Jan 07, 2025
Viaarxiv icon

VELoRA: A Low-Rank Adaptation Approach for Efficient RGB-Event based Recognition

Add code
Dec 28, 2024
Viaarxiv icon

Towards Visual Grounding: A Survey

Add code
Dec 28, 2024
Viaarxiv icon

Core Context Aware Attention for Long Context Language Modeling

Add code
Dec 17, 2024
Viaarxiv icon

Efficient Dataset Distillation via Diffusion-Driven Patch Selection for Improved Generalization

Add code
Dec 13, 2024
Figure 1 for Efficient Dataset Distillation via Diffusion-Driven Patch Selection for Improved Generalization
Figure 2 for Efficient Dataset Distillation via Diffusion-Driven Patch Selection for Improved Generalization
Figure 3 for Efficient Dataset Distillation via Diffusion-Driven Patch Selection for Improved Generalization
Figure 4 for Efficient Dataset Distillation via Diffusion-Driven Patch Selection for Improved Generalization
Viaarxiv icon

Towards Long Video Understanding via Fine-detailed Video Story Generation

Add code
Dec 09, 2024
Figure 1 for Towards Long Video Understanding via Fine-detailed Video Story Generation
Figure 2 for Towards Long Video Understanding via Fine-detailed Video Story Generation
Figure 3 for Towards Long Video Understanding via Fine-detailed Video Story Generation
Figure 4 for Towards Long Video Understanding via Fine-detailed Video Story Generation
Viaarxiv icon

Do We Need to Design Specific Diffusion Models for Different Tasks? Try ONE-PIC

Add code
Dec 07, 2024
Viaarxiv icon

CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs

Add code
Nov 19, 2024
Figure 1 for CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs
Figure 2 for CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs
Figure 3 for CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs
Figure 4 for CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs
Viaarxiv icon

OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling

Add code
Oct 10, 2024
Figure 1 for OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling
Figure 2 for OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling
Figure 3 for OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling
Figure 4 for OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling
Viaarxiv icon

EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment

Add code
Oct 08, 2024
Figure 1 for EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment
Figure 2 for EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment
Figure 3 for EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment
Figure 4 for EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment
Viaarxiv icon