Picture for Yaowei Wang

Yaowei Wang

EventSTR: A Benchmark Dataset and Baselines for Event Stream based Scene Text Recognition

Add code
Feb 13, 2025
Viaarxiv icon

Activating Associative Disease-Aware Vision Token Memory for LLM-Based X-ray Report Generation

Add code
Jan 07, 2025
Viaarxiv icon

VELoRA: A Low-Rank Adaptation Approach for Efficient RGB-Event based Recognition

Add code
Dec 28, 2024
Viaarxiv icon

Towards Visual Grounding: A Survey

Add code
Dec 28, 2024
Viaarxiv icon

Core Context Aware Attention for Long Context Language Modeling

Add code
Dec 17, 2024
Figure 1 for Core Context Aware Attention for Long Context Language Modeling
Figure 2 for Core Context Aware Attention for Long Context Language Modeling
Figure 3 for Core Context Aware Attention for Long Context Language Modeling
Figure 4 for Core Context Aware Attention for Long Context Language Modeling
Viaarxiv icon

Efficient Dataset Distillation via Diffusion-Driven Patch Selection for Improved Generalization

Add code
Dec 13, 2024
Figure 1 for Efficient Dataset Distillation via Diffusion-Driven Patch Selection for Improved Generalization
Figure 2 for Efficient Dataset Distillation via Diffusion-Driven Patch Selection for Improved Generalization
Figure 3 for Efficient Dataset Distillation via Diffusion-Driven Patch Selection for Improved Generalization
Figure 4 for Efficient Dataset Distillation via Diffusion-Driven Patch Selection for Improved Generalization
Viaarxiv icon

Towards Long Video Understanding via Fine-detailed Video Story Generation

Add code
Dec 09, 2024
Figure 1 for Towards Long Video Understanding via Fine-detailed Video Story Generation
Figure 2 for Towards Long Video Understanding via Fine-detailed Video Story Generation
Figure 3 for Towards Long Video Understanding via Fine-detailed Video Story Generation
Figure 4 for Towards Long Video Understanding via Fine-detailed Video Story Generation
Viaarxiv icon

Do We Need to Design Specific Diffusion Models for Different Tasks? Try ONE-PIC

Add code
Dec 07, 2024
Viaarxiv icon

CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs

Add code
Nov 19, 2024
Figure 1 for CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs
Figure 2 for CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs
Figure 3 for CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs
Figure 4 for CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs
Viaarxiv icon

OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling

Add code
Oct 10, 2024
Figure 1 for OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling
Figure 2 for OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling
Figure 3 for OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling
Figure 4 for OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling
Viaarxiv icon