Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:M3-VRD: Multimodal Multi-task Multi-teacher Visually-Rich Form Document Understanding

Feb 28, 2024

Yihao Ding, Lorenzo Vaiani, Caren Han, Jean Lee, Paolo Garza, Josiah Poon, Luca Cagliero

Figure 1 for M3-VRD: Multimodal Multi-task Multi-teacher Visually-Rich Form Document Understanding

Figure 2 for M3-VRD: Multimodal Multi-task Multi-teacher Visually-Rich Form Document Understanding

Figure 3 for M3-VRD: Multimodal Multi-task Multi-teacher Visually-Rich Form Document Understanding

Figure 4 for M3-VRD: Multimodal Multi-task Multi-teacher Visually-Rich Form Document Understanding

Share this with someone who'll enjoy it:

Abstract:This paper presents a groundbreaking multimodal, multi-task, multi-teacher joint-grained knowledge distillation model for visually-rich form document understanding. The model is designed to leverage insights from both fine-grained and coarse-grained levels by facilitating a nuanced correlation between token and entity representations, addressing the complexities inherent in form documents. Additionally, we introduce new inter-grained and cross-grained loss functions to further refine diverse multi-teacher knowledge distillation transfer process, presenting distribution gaps and a harmonised understanding of form documents. Through a comprehensive evaluation across publicly available form document understanding datasets, our proposed model consistently outperforms existing baselines, showcasing its efficacy in handling the intricate structures and content of visually complex form documents.

* Work in progress

View paper on

Share this with someone who'll enjoy it:

Title:M3-VRD: Multimodal Multi-task Multi-teacher Visually-Rich Form Document Understanding

Paper and Code