Abstract:Large-scale pretrained models such as LXMERT are becoming popular for learning cross-modal representations on text-image pairs for vision-language tasks. According to the lottery ticket hypothesis, NLP and computer vision models contain smaller subnetworks capable of being trained in isolation to full performance. In this paper, we combine these observations to evaluate whether such trainable subnetworks exist in LXMERT when fine-tuned on the VQA task. In addition, we perform a model size cost-benefit analysis by investigating how much pruning can be done without significant loss in accuracy. Our experiment results demonstrate that LXMERT can be effectively pruned by 40%-60% in size with 3% loss in accuracy.
Abstract:The increasing complexity of AI systems has led to the growth of the field of explainable AI (XAI), which aims to provide explanations and justifications for the outputs of AI algorithms. These methods mainly focus on feature importance and identifying changes that can be made to achieve a desired outcome. Researchers have identified desired properties for XAI methods, such as plausibility, sparsity, causality, low run-time, etc. The objective of this study is to conduct a review of existing XAI research and present a classification of XAI methods. The study also aims to connect XAI users with the appropriate method and relate desired properties to current XAI approaches. The outcome of this study will be a clear strategy that outlines how to choose the right XAI method for a particular goal and user and provide a personalized explanation for users.
Abstract:This paper presents a novel approach and a new dataset for the problem of driver drowsiness and distraction detection. Lack of an available and accurate eye dataset strongly feels in the area of eye closure detection. Therefore, a new comprehensive dataset is proposed, and a study on driver distraction of the road is provided to supply safety for the drivers. A deep network is also designed in such a way that two goals of real-time application, including high accuracy and fastness, are considered simultaneously. The main purposes of this article are as follows: Estimation of driver head direction for distraction detection, introduce a new comprehensive dataset to detect eye closure, and also, presentation of three networks in which one of them is a fully designed deep neural network (FD-DNN) and others use transfer learning with VGG16 and VGG19 with extra designed layers (TL-VGG). The experimental results show the high accuracy and low computational complexity of the estimations and the ability of the proposed networks on drowsiness detection.