Picture for Qiang Chen

Qiang Chen

ESTR-CoT: Towards Explainable and Accurate Event Stream based Scene Text Recognition with Chain-of-Thought Reasoning

Add code
Jul 02, 2025
Viaarxiv icon

Multi-Interest Recommendation: A Survey

Add code
Jun 18, 2025
Viaarxiv icon

APTOS-2024 challenge report: Generation of synthetic 3D OCT images from fundus photographs

Add code
Jun 09, 2025
Viaarxiv icon

Tweedie Regression for Video Recommendation System

Add code
May 09, 2025
Viaarxiv icon

Adversarial Attack for RGB-Event based Visual Object Tracking

Add code
Apr 19, 2025
Viaarxiv icon

RGB-Event based Pedestrian Attribute Recognition: A Benchmark Dataset and An Asymmetric RWKV Fusion Framework

Add code
Apr 14, 2025
Viaarxiv icon

XiHeFusion: Harnessing Large Language Models for Science Communication in Nuclear Fusion

Add code
Feb 08, 2025
Viaarxiv icon

Implicit Location-Caption Alignment via Complementary Masking for Weakly-Supervised Dense Video Captioning

Add code
Dec 17, 2024
Figure 1 for Implicit Location-Caption Alignment via Complementary Masking for Weakly-Supervised Dense Video Captioning
Figure 2 for Implicit Location-Caption Alignment via Complementary Masking for Weakly-Supervised Dense Video Captioning
Figure 3 for Implicit Location-Caption Alignment via Complementary Masking for Weakly-Supervised Dense Video Captioning
Figure 4 for Implicit Location-Caption Alignment via Complementary Masking for Weakly-Supervised Dense Video Captioning
Viaarxiv icon

Continual SFT Matches Multimodal RLHF with Negative Supervision

Add code
Nov 22, 2024
Figure 1 for Continual SFT Matches Multimodal RLHF with Negative Supervision
Figure 2 for Continual SFT Matches Multimodal RLHF with Negative Supervision
Figure 3 for Continual SFT Matches Multimodal RLHF with Negative Supervision
Figure 4 for Continual SFT Matches Multimodal RLHF with Negative Supervision
Viaarxiv icon

Improving Multi-modal Large Language Model through Boosting Vision Capabilities

Add code
Oct 17, 2024
Figure 1 for Improving Multi-modal Large Language Model through Boosting Vision Capabilities
Figure 2 for Improving Multi-modal Large Language Model through Boosting Vision Capabilities
Figure 3 for Improving Multi-modal Large Language Model through Boosting Vision Capabilities
Figure 4 for Improving Multi-modal Large Language Model through Boosting Vision Capabilities
Viaarxiv icon