Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Thanh-Dat Nguyen

Pick

VRDSynth: Synthesizing Programs for Multilingual Visually Rich Document Information Extraction

Jul 09, 2024

Thanh-Dat Nguyen, Tung Do-Viet, Hung Nguyen-Duy, Tuan-Hai Luu, Hung Le, Bach Le, Patanamon, Thongtanunam

Figure 1 for VRDSynth: Synthesizing Programs for Multilingual Visually Rich Document Information Extraction

Figure 2 for VRDSynth: Synthesizing Programs for Multilingual Visually Rich Document Information Extraction

Figure 3 for VRDSynth: Synthesizing Programs for Multilingual Visually Rich Document Information Extraction

Figure 4 for VRDSynth: Synthesizing Programs for Multilingual Visually Rich Document Information Extraction

Abstract:Businesses need to query visually rich documents (VRDs) like receipts, medical records, and insurance forms to make decisions. Existing techniques for extracting entities from VRDs struggle with new layouts or require extensive pre-training data. We introduce VRDSynth, a program synthesis method to automatically extract entity relations from multilingual VRDs without pre-training data. To capture the complexity of VRD domain, we design a domain-specific language (DSL) to capture spatial and textual relations to describe the synthesized programs. Along with this, we also derive a new synthesis algorithm utilizing frequent spatial relations, search space pruning, and a combination of positive, negative, and exclusive programs to improve coverage. We evaluate VRDSynth on the FUNSD and XFUND benchmarks for semantic entity linking, consisting of 1,592 forms in 8 languages. VRDSynth outperforms state-of-the-art pre-trained models (LayoutXLM, InfoXLMBase, and XLMRobertaBase) in 5, 6, and 7 out of 8 languages, respectively, improving the F1 score by 42% over LayoutXLM in English. To test the extensibility of the model, we further improve VRDSynth with automated table recognition, creating VRDSynth(Table), and compare it with extended versions of the pre-trained models, InfoXLM(Large) and XLMRoberta(Large). VRDSynth(Table) outperforms these baselines in 4 out of 8 languages and in average F1 score. VRDSynth also significantly reduces memory footprint (1M and 380MB vs. 1.48GB and 3GB for LayoutXLM) while maintaining similar time efficiency.

* Accepted in ISSTA'24

Via

Access Paper or Ask Questions

POCS-based Clustering Algorithm

Aug 15, 2022

Le-Anh Tran, Henock M. Deberneh, Truong-Dong Do, Thanh-Dat Nguyen, My-Ha Le, Dong-Chul Park

Figure 1 for POCS-based Clustering Algorithm

Figure 2 for POCS-based Clustering Algorithm

Figure 3 for POCS-based Clustering Algorithm

Figure 4 for POCS-based Clustering Algorithm

Abstract:A novel clustering technique based on the projection onto convex set (POCS) method, called POCS-based clustering algorithm, is proposed in this paper. The proposed POCS-based clustering algorithm exploits a parallel projection method of POCS to find appropriate cluster prototypes in the feature space. The algorithm considers each data point as a convex set and projects the cluster prototypes parallelly to the member data points. The projections are convexly combined to minimize the objective function for data clustering purpose. The performance of the proposed POCS-based clustering algorithm is verified through experiments on various synthetic datasets. The experimental results show that the proposed POCS-based clustering algorithm is competitive and efficient in terms of clustering error and execution speed when compared with other conventional clustering methods including Fuzzy C-Means (FCM) and K-means clustering algorithms.

* 6 pages, 4 figures, IWIS 2022

Via

Access Paper or Ask Questions

Toward the Analysis of Graph Neural Networks

Jan 01, 2022

Thanh-Dat Nguyen, Thanh Le-Cong, ThanhVu H. Nguyen, Xuan-Bach D. Le, Quyet-Thang Huynh

Figure 1 for Toward the Analysis of Graph Neural Networks

Figure 2 for Toward the Analysis of Graph Neural Networks

Figure 3 for Toward the Analysis of Graph Neural Networks

Abstract:Graph Neural Networks (GNNs) have recently emerged as a robust framework for graph-structured data. They have been applied to many problems such as knowledge graph analysis, social networks recommendation, and even Covid19 detection and vaccine developments. However, unlike other deep neural networks such as Feed Forward Neural Networks (FFNNs), few analyses such as verification and property inferences exist, potentially due to dynamic behaviors of GNNs, which can take arbitrary graphs as input, whereas FFNNs which only take fixed size numerical vectors as inputs. This paper proposes an approach to analyze GNNs by converting them into FFNNs and reusing existing FFNNs analyses. We discuss various designs to ensure the scalability and accuracy of the conversions. We illustrate our method on a study case of node classification. We believe that our approach opens new research directions for understanding and analyzing GNNs.

* The 44th IEEE/ACM International Conference on Software Engineering (ICSE 2022)
* Accepted to ICSE 2022, NIER track

Via

Access Paper or Ask Questions

PR-CIM: a Variation-Aware Binary-Neural-Network Framework for Process-Resilient Computation-in-memory

Oct 19, 2021

Minh-Son Le, Thi-Nhan Pham, Thanh-Dat Nguyen, Ik-Joon Chang

Figure 1 for PR-CIM: a Variation-Aware Binary-Neural-Network Framework for Process-Resilient Computation-in-memory

Figure 2 for PR-CIM: a Variation-Aware Binary-Neural-Network Framework for Process-Resilient Computation-in-memory

Figure 3 for PR-CIM: a Variation-Aware Binary-Neural-Network Framework for Process-Resilient Computation-in-memory

Figure 4 for PR-CIM: a Variation-Aware Binary-Neural-Network Framework for Process-Resilient Computation-in-memory

Abstract:Binary neural networks (BNNs) that use 1-bit weights and activations have garnered interest as extreme quantization provides low power dissipation. By implementing BNNs as computing-in-memory (CIM), which computes multiplication and accumulations on memory arrays in an analog fashion, namely analog CIM, we can further improve the energy efficiency to process neural networks. However, analog CIMs suffer from the potential problem that process variation degrades the accuracy of BNNs. Our Monte-Carlo simulations show that in an SRAM-based analog CIM of VGG-9, the classification accuracy of CIFAR-10 is degraded even below 20% under process variations of 65nm CMOS. To overcome this problem, we present a variation-aware BNN framework. The proposed framework is developed for SRAM-based BNN CIMs since SRAM is most widely used as on-chip memory, however easily extensible to BNN CIMs based on other memories. Our extensive experimental results show that under process variation of 65nm CMOS, our framework significantly improves the CIFAR-10 accuracies of SRAM-based BNN CIMs, from 10% and 10.1% to 87.76% and 77.74% for VGG-9 and RESNET-18 respectively.

* 8 pages, 11 figures

Via

Access Paper or Ask Questions

End-to-End Hierarchical Relation Extraction for Generic Form Understanding

Jun 02, 2021

Tuan-Anh Nguyen Dang, Duc-Thanh Hoang, Quang-Bach Tran, Chih-Wei Pan, Thanh-Dat Nguyen

Figure 1 for End-to-End Hierarchical Relation Extraction for Generic Form Understanding

Figure 2 for End-to-End Hierarchical Relation Extraction for Generic Form Understanding

Figure 3 for End-to-End Hierarchical Relation Extraction for Generic Form Understanding

Figure 4 for End-to-End Hierarchical Relation Extraction for Generic Form Understanding

Abstract:Form understanding is a challenging problem which aims to recognize semantic entities from the input document and their hierarchical relations. Previous approaches face significant difficulty dealing with the complexity of the task, thus treat these objectives separately. To this end, we present a novel deep neural network to jointly perform both entity detection and link prediction in an end-to-end fashion. Our model extends the Multi-stage Attentional U-Net architecture with the Part-Intensity Fields and Part-Association Fields for link prediction, enriching the spatial information flow with the additional supervision from entity linking. We demonstrate the effectiveness of the model on the Form Understanding in Noisy Scanned Documents (FUNSD) dataset, where our method substantially outperforms the original model and state-of-the-art baselines in both Entity Labeling and Entity Linking task.

* 2020 25th International Conference on Pattern Recognition (ICPR)
* Accepted to ICPR 2020

Via

Access Paper or Ask Questions