Picture for Siwen Luo

Siwen Luo

Docs2Synth: A Synthetic Data Trained Retriever Framework for Scanned Visually Rich Documents Understanding

Add code
Jan 18, 2026
Viaarxiv icon

Multimodal Commonsense Knowledge Distillation for Visual Question Answering

Add code
Nov 05, 2024
Viaarxiv icon

'No' Matters: Out-of-Distribution Detection in Multimodality Long Dialogue

Add code
Oct 31, 2024
Figure 1 for 'No' Matters: Out-of-Distribution Detection in Multimodality Long Dialogue
Figure 2 for 'No' Matters: Out-of-Distribution Detection in Multimodality Long Dialogue
Figure 3 for 'No' Matters: Out-of-Distribution Detection in Multimodality Long Dialogue
Figure 4 for 'No' Matters: Out-of-Distribution Detection in Multimodality Long Dialogue
Viaarxiv icon

3M-Health: Multimodal Multi-Teacher Knowledge Distillation for Mental Health Detection

Add code
Jul 12, 2024
Figure 1 for 3M-Health: Multimodal Multi-Teacher Knowledge Distillation for Mental Health Detection
Figure 2 for 3M-Health: Multimodal Multi-Teacher Knowledge Distillation for Mental Health Detection
Figure 3 for 3M-Health: Multimodal Multi-Teacher Knowledge Distillation for Mental Health Detection
Figure 4 for 3M-Health: Multimodal Multi-Teacher Knowledge Distillation for Mental Health Detection
Viaarxiv icon

PDF-MVQA: A Dataset for Multimodal Information Retrieval in-based Visual Question Answering

Add code
Apr 19, 2024
Figure 1 for PDF-MVQA: A Dataset for Multimodal Information Retrieval in-based Visual Question Answering
Figure 2 for PDF-MVQA: A Dataset for Multimodal Information Retrieval in-based Visual Question Answering
Figure 3 for PDF-MVQA: A Dataset for Multimodal Information Retrieval in-based Visual Question Answering
Figure 4 for PDF-MVQA: A Dataset for Multimodal Information Retrieval in-based Visual Question Answering
Viaarxiv icon

Workshop on Document Intelligence Understanding

Add code
Jul 31, 2023
Figure 1 for Workshop on Document Intelligence Understanding
Figure 2 for Workshop on Document Intelligence Understanding
Figure 3 for Workshop on Document Intelligence Understanding
Viaarxiv icon

PDFVQA: A New Dataset for Real-World VQA on Documents

Add code
Apr 24, 2023
Figure 1 for PDFVQA: A New Dataset for Real-World VQA on Documents
Figure 2 for PDFVQA: A New Dataset for Real-World VQA on Documents
Figure 3 for PDFVQA: A New Dataset for Real-World VQA on Documents
Figure 4 for PDFVQA: A New Dataset for Real-World VQA on Documents
Viaarxiv icon

SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering

Add code
Dec 16, 2022
Figure 1 for SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering
Figure 2 for SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering
Figure 3 for SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering
Figure 4 for SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering
Viaarxiv icon

PiggyBack: Pretrained Visual Question Answering Environment for Backing up Non-deep Learning Professionals

Add code
Dec 01, 2022
Figure 1 for PiggyBack: Pretrained Visual Question Answering Environment for Backing up Non-deep Learning Professionals
Figure 2 for PiggyBack: Pretrained Visual Question Answering Environment for Backing up Non-deep Learning Professionals
Figure 3 for PiggyBack: Pretrained Visual Question Answering Environment for Backing up Non-deep Learning Professionals
Viaarxiv icon

Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis

Add code
Aug 22, 2022
Figure 1 for Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis
Figure 2 for Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis
Figure 3 for Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis
Figure 4 for Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis
Viaarxiv icon