Picture for Xiameng Qin

Xiameng Qin

World knowledge-enhanced Reasoning Using Instruction-guided Interactor in Autonomous Driving

Add code
Dec 09, 2024
Viaarxiv icon

Seeing Beyond Views: Multi-View Driving Scene Video Generation with Holistic Attention

Add code
Dec 04, 2024
Viaarxiv icon

Collaborative Position Reasoning Network for Referring Image Segmentation

Add code
Jan 22, 2024
Viaarxiv icon

MataDoc: Margin and Text Aware Document Dewarping for Arbitrary Boundary

Add code
Jul 24, 2023
Viaarxiv icon

TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision

Add code
Jun 06, 2023
Viaarxiv icon

Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided Dynamic Token Merge for Document Understanding

Add code
May 19, 2023
Viaarxiv icon

StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training

Add code
Mar 01, 2023
Figure 1 for StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training
Figure 2 for StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training
Figure 3 for StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training
Figure 4 for StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training
Viaarxiv icon

Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering

Add code
Dec 14, 2021
Figure 1 for Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering
Figure 2 for Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering
Figure 3 for Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering
Figure 4 for Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering
Viaarxiv icon

StrucTexT: Structured Text Understanding with Multi-Modal Transformers

Add code
Aug 10, 2021
Figure 1 for StrucTexT: Structured Text Understanding with Multi-Modal Transformers
Figure 2 for StrucTexT: Structured Text Understanding with Multi-Modal Transformers
Figure 3 for StrucTexT: Structured Text Understanding with Multi-Modal Transformers
Figure 4 for StrucTexT: Structured Text Understanding with Multi-Modal Transformers
Viaarxiv icon

EATEN: Entity-aware Attention for Single Shot Visual Text Extraction

Add code
Sep 20, 2019
Figure 1 for EATEN: Entity-aware Attention for Single Shot Visual Text Extraction
Figure 2 for EATEN: Entity-aware Attention for Single Shot Visual Text Extraction
Figure 3 for EATEN: Entity-aware Attention for Single Shot Visual Text Extraction
Figure 4 for EATEN: Entity-aware Attention for Single Shot Visual Text Extraction
Viaarxiv icon