Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tung Le

Advancing Vietnamese Visual Question Answering with Transformer and Convolutional Integration

Jul 30, 2024

Ngoc Son Nguyen, Van Son Nguyen, Tung Le

Figure 1 for Advancing Vietnamese Visual Question Answering with Transformer and Convolutional Integration

Figure 2 for Advancing Vietnamese Visual Question Answering with Transformer and Convolutional Integration

Figure 3 for Advancing Vietnamese Visual Question Answering with Transformer and Convolutional Integration

Figure 4 for Advancing Vietnamese Visual Question Answering with Transformer and Convolutional Integration

Abstract:Visual Question Answering (VQA) has recently emerged as a potential research domain, captivating the interest of many in the field of artificial intelligence and computer vision. Despite the prevalence of approaches in English, there is a notable lack of systems specifically developed for certain languages, particularly Vietnamese. This study aims to bridge this gap by conducting comprehensive experiments on the Vietnamese Visual Question Answering (ViVQA) dataset, demonstrating the effectiveness of our proposed model. In response to community interest, we have developed a model that enhances image representation capabilities, thereby improving overall performance in the ViVQA system. Specifically, our model integrates the Bootstrapping Language-Image Pre-training with frozen unimodal models (BLIP-2) and the convolutional neural network EfficientNet to extract and process both local and global features from images. This integration leverages the strengths of transformer-based architectures for capturing comprehensive contextual information and convolutional networks for detailed local features. By freezing the parameters of these pre-trained models, we significantly reduce the computational cost and training time, while maintaining high performance. This approach significantly improves image representation and enhances the performance of existing VQA systems. We then leverage a multi-modal fusion module based on a general-purpose multi-modal foundation model (BEiT-3) to fuse the information between visual and textual features. Our experimental findings demonstrate that our model surpasses competing baselines, achieving promising performance. This is particularly evident in its accuracy of $71.04\%$ on the test set of the ViVQA dataset, marking a significant advancement in our research area. The code is available at https://github.com/nngocson2002/ViVQA.

* Computers and Electrical Engineering 119 (2024) 109474
* Accepted at the journal of Computers & Electrical Engineering (Received 8 March 2024, Revised 8 June 2024, Accepted 10 July 2024)

Via

Access Paper or Ask Questions

Integrating Efficient Optimal Transport and Functional Maps For Unsupervised Shape Correspondence Learning

Mar 04, 2024

Tung Le, Khai Nguyen, Shanlin Sun, Nhat Ho, Xiaohui Xie

Figure 1 for Integrating Efficient Optimal Transport and Functional Maps For Unsupervised Shape Correspondence Learning

Figure 2 for Integrating Efficient Optimal Transport and Functional Maps For Unsupervised Shape Correspondence Learning

Figure 3 for Integrating Efficient Optimal Transport and Functional Maps For Unsupervised Shape Correspondence Learning

Figure 4 for Integrating Efficient Optimal Transport and Functional Maps For Unsupervised Shape Correspondence Learning

Abstract:In the realm of computer vision and graphics, accurately establishing correspondences between geometric 3D shapes is pivotal for applications like object tracking, registration, texture transfer, and statistical shape analysis. Moving beyond traditional hand-crafted and data-driven feature learning methods, we incorporate spectral methods with deep learning, focusing on functional maps (FMs) and optimal transport (OT). Traditional OT-based approaches, often reliant on entropy regularization OT in learning-based framework, face computational challenges due to their quadratic cost. Our key contribution is to employ the sliced Wasserstein distance (SWD) for OT, which is a valid fast optimal transport metric in an unsupervised shape matching framework. This unsupervised framework integrates functional map regularizers with a novel OT-based loss derived from SWD, enhancing feature alignment between shapes treated as discrete probability measures. We also introduce an adaptive refinement process utilizing entropy regularized OT, further refining feature alignments for accurate point-to-point correspondences. Our method demonstrates superior performance in non-rigid shape matching, including near-isometric and non-isometric scenarios, and excels in downstream tasks like segmentation transfer. The empirical results on diverse datasets highlight our framework's effectiveness and generalization capabilities, setting new standards in non-rigid shape matching with efficient OT metrics and an adaptive refinement module.

* accepted by CVPR 2024

Via

Access Paper or Ask Questions

Diffeomorphic Deformation via Sliced Wasserstein Distance Optimization for Cortical Surface Reconstruction

May 27, 2023

Tung Le, Khai Nguyen, Shanlin Sun, Kun Han, Nhat Ho, Xiaohui Xie

Abstract:Mesh deformation is a core task for 3D mesh reconstruction, but defining an efficient discrepancy between predicted and target meshes remains an open problem. A prevalent approach in current deep learning is the set-based approach which measures the discrepancy between two surfaces by comparing two randomly sampled point-clouds from the two meshes with Chamfer pseudo-distance. Nevertheless, the set-based approach still has limitations such as lacking a theoretical guarantee for choosing the number of points in sampled point-clouds, and the pseudo-metricity and the quadratic complexity of the Chamfer divergence. To address these issues, we propose a novel metric for learning mesh deformation. The metric is defined by sliced Wasserstein distance on meshes represented as probability measures that generalize the set-based approach. By leveraging probability measure space, we gain flexibility in encoding meshes using diverse forms of probability measures, such as continuous, empirical, and discrete measures via \textit{varifold} representation. After having encoded probability measures, we can compare meshes by using the sliced Wasserstein distance which is an effective optimal transport distance with linear computational complexity and can provide a fast statistical rate for approximating the surface of meshes. Furthermore, we employ a neural ordinary differential equation (ODE) to deform the input surface into the target shape by modeling the trajectories of the points on the surface. Our experiments on cortical surface reconstruction demonstrate that our approach surpasses other competing methods in multiple datasets and metrics.

Via

Access Paper or Ask Questions

A Summary of the ALQAC 2021 Competition

Apr 25, 2022

Nguyen Ha Thanh, Bui Minh Quan, Chau Nguyen, Tung Le, Nguyen Minh Phuong, Dang Tran Binh, Vuong Thi Hai Yen, Teeradaj Racharak, Nguyen Le Minh, Tran Duc Vu(+6 more)

Figure 1 for A Summary of the ALQAC 2021 Competition

Figure 2 for A Summary of the ALQAC 2021 Competition

Figure 3 for A Summary of the ALQAC 2021 Competition

Abstract:We summarize the evaluation of the first Automated Legal Question Answering Competition (ALQAC 2021). The competition this year contains three tasks, which aims at processing the statute law document, which are Legal Text Information Retrieval (Task 1), Legal Text Entailment Prediction (Task 2), and Legal Text Question Answering (Task 3). The final goal of these tasks is to build a system that can automatically determine whether a particular statement is lawful. There is no limit to the approaches of the participating teams. This year, there are 5 teams participating in Task 1, 6 teams participating in Task 2, and 5 teams participating in Task 3. There are in total 36 runs submitted to the organizer. In this paper, we summarize each team's approaches, official results, and some discussion about the competition. Only results of the teams who successfully submit their approach description paper are reported in this paper.

Via

Access Paper or Ask Questions

Goal scoring in Premier League with Poisson regression

Jul 10, 2021

Cuong Pham, Tung Le

Abstract:Premier League is known as one of the most competitive football league in the world, hence there are many goals are scored here every match. Which are the factors that affect to the number of goal scored in each match? We use Poisson regression to find out the relation between many factors as shots on target, corners, red cards, to the goals home team can score in their match.

* 13 pages, 14 figures

Via

Access Paper or Ask Questions