Picture for Siyang Qin

Siyang Qin

PaliGemma 2: A Family of Versatile VLMs for Transfer

Add code
Dec 04, 2024
Figure 1 for PaliGemma 2: A Family of Versatile VLMs for Transfer
Figure 2 for PaliGemma 2: A Family of Versatile VLMs for Transfer
Figure 3 for PaliGemma 2: A Family of Versatile VLMs for Transfer
Figure 4 for PaliGemma 2: A Family of Versatile VLMs for Transfer
Viaarxiv icon

Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens

Add code
Oct 17, 2024
Figure 1 for Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens
Figure 2 for Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens
Figure 3 for Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens
Figure 4 for Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens
Viaarxiv icon

Hierarchical Text Spotter for Joint Text Spotting and Layout Analysis

Add code
Oct 25, 2023
Figure 1 for Hierarchical Text Spotter for Joint Text Spotting and Layout Analysis
Figure 2 for Hierarchical Text Spotter for Joint Text Spotting and Layout Analysis
Figure 3 for Hierarchical Text Spotter for Joint Text Spotting and Layout Analysis
Figure 4 for Hierarchical Text Spotter for Joint Text Spotting and Layout Analysis
Viaarxiv icon

ICDAR 2023 Competition on Hierarchical Text Detection and Recognition

Add code
May 16, 2023
Viaarxiv icon

FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction

Add code
May 04, 2023
Figure 1 for FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction
Figure 2 for FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction
Figure 3 for FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction
Figure 4 for FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction
Viaarxiv icon

Towards End-to-End Unified Scene Text Detection and Layout Analysis

Add code
Mar 28, 2022
Figure 1 for Towards End-to-End Unified Scene Text Detection and Layout Analysis
Figure 2 for Towards End-to-End Unified Scene Text Detection and Layout Analysis
Figure 3 for Towards End-to-End Unified Scene Text Detection and Layout Analysis
Figure 4 for Towards End-to-End Unified Scene Text Detection and Layout Analysis
Viaarxiv icon

ROPE: Reading Order Equivariant Positional Encoding for Graph-based Document Information Extraction

Add code
Jun 21, 2021
Figure 1 for ROPE: Reading Order Equivariant Positional Encoding for Graph-based Document Information Extraction
Figure 2 for ROPE: Reading Order Equivariant Positional Encoding for Graph-based Document Information Extraction
Figure 3 for ROPE: Reading Order Equivariant Positional Encoding for Graph-based Document Information Extraction
Figure 4 for ROPE: Reading Order Equivariant Positional Encoding for Graph-based Document Information Extraction
Viaarxiv icon

Rethinking Text Line Recognition Models

Add code
Apr 21, 2021
Figure 1 for Rethinking Text Line Recognition Models
Figure 2 for Rethinking Text Line Recognition Models
Figure 3 for Rethinking Text Line Recognition Models
Figure 4 for Rethinking Text Line Recognition Models
Viaarxiv icon

Towards Unconstrained End-to-End Text Spotting

Add code
Aug 24, 2019
Figure 1 for Towards Unconstrained End-to-End Text Spotting
Figure 2 for Towards Unconstrained End-to-End Text Spotting
Figure 3 for Towards Unconstrained End-to-End Text Spotting
Figure 4 for Towards Unconstrained End-to-End Text Spotting
Viaarxiv icon

Automatic Semantic Content Removal by Learning to Neglect

Add code
Jul 20, 2018
Figure 1 for Automatic Semantic Content Removal by Learning to Neglect
Figure 2 for Automatic Semantic Content Removal by Learning to Neglect
Figure 3 for Automatic Semantic Content Removal by Learning to Neglect
Figure 4 for Automatic Semantic Content Removal by Learning to Neglect
Viaarxiv icon