Abstract:In the dynamic urban landscape, where the interplay of vehicles and pedestrians defines the rhythm of life, integrating advanced technology for safety and efficiency is increasingly crucial. This study delves into the application of cutting-edge technological methods in smart cities, focusing on enhancing public safety through improved traffic accident detection. Action recognition plays a pivotal role in interpreting visual data and tracking object motion such as human pose estimation in video sequences. The challenges of action recognition include variability in rapid actions, limited dataset, and environmental factors such as (Weather, Illumination, and Occlusions). In this paper, we present a novel comprehensive dataset for traffic accident detection. This datasets is specifically designed to bolster computer vision and action recognition systems in predicting and detecting road traffic accidents. We integrated datasets from wide variety of data sources, road networks, weather conditions, and regions across the globe. This approach is underpinned by empirical studies, aiming to contribute to the discourse on how technology can enhance the quality of life in densely populated areas. This research aims to bridge existing research gaps by introducing benchmark datasets that leverage state-of-the-art algorithms tailored for traffic accident detection in smart cities. These dataset is expected to advance academic research and also enhance real-time accident detection applications, contributing significantly to the evolution of smart urban environments. Our study marks a pivotal step towards safer, more efficient smart cities, harnessing the power of AI and machine learning to transform urban living.
Abstract:Precise crop yield prediction provides valuable information for agricultural planning and decision-making processes. However, timely predicting crop yields remains challenging as crop growth is sensitive to growing season weather variation and climate change. In this work, we develop a deep learning-based solution, namely Multi-Modal Spatial-Temporal Vision Transformer (MMST-ViT), for predicting crop yields at the county level across the United States, by considering the effects of short-term meteorological variations during the growing season and the long-term climate change on crops. Specifically, our MMST-ViT consists of a Multi-Modal Transformer, a Spatial Transformer, and a Temporal Transformer. The Multi-Modal Transformer leverages both visual remote sensing data and short-term meteorological data for modeling the effect of growing season weather variations on crop growth. The Spatial Transformer learns the high-resolution spatial dependency among counties for accurate agricultural tracking. The Temporal Transformer captures the long-range temporal dependency for learning the impact of long-term climate change on crops. Meanwhile, we also devise a novel multi-modal contrastive learning technique to pre-train our model without extensive human supervision. Hence, our MMST-ViT captures the impacts of both short-term weather variations and long-term climate change on crops by leveraging both satellite images and meteorological data. We have conducted extensive experiments on over 200 counties in the United States, with the experimental results exhibiting that our MMST-ViT outperforms its counterparts under three performance metrics of interest.
Abstract:The rapid progress in technology innovation usage and distribution has increased in the last decade. The rapid growth of the Internet of Things (IoT) systems worldwide has increased network security challenges created by malicious third parties. Thus, reliable intrusion detection and network forensics systems that consider security concerns and IoT systems limitations are essential to protect such systems. IoT botnet attacks are one of the significant threats to enterprises and individuals. Thus, this paper proposed an economic deep learning-based model for detecting IoT botnet attacks along with different types of attacks. The proposed model achieved higher accuracy than the state-of-the-art detection models using a smaller implementation budget and accelerating the training and detecting processes.
Abstract:Understanding human behavior and monitoring mental health are essential to maintaining the community and society's safety. As there has been an increase in mental health problems during the COVID-19 pandemic due to uncontrolled mental health, early detection of mental issues is crucial. Nowadays, the usage of Intelligent Virtual Personal Assistants (IVA) has increased worldwide. Individuals use their voices to control these devices to fulfill requests and acquire different services. This paper proposes a novel deep learning model based on the gated recurrent neural network and convolution neural network to understand human emotion from speech to improve their IVA services and monitor their mental health.
Abstract:Action detection and public traffic safety are crucial aspects of a safe community and a better society. Monitoring traffic flows in a smart city using different surveillance cameras can play a significant role in recognizing accidents and alerting first responders. The utilization of action recognition (AR) in computer vision tasks has contributed towards high-precision applications in video surveillance, medical imaging, and digital signal processing. This paper presents an intensive review focusing on action recognition in accident detection and autonomous transportation systems for a smart city. In this paper, we focused on AR systems that used diverse sources of traffic video capturing, such as static surveillance cameras on traffic intersections, highway monitoring cameras, drone cameras, and dash-cams. Through this review, we identified the primary techniques, taxonomies, and algorithms used in AR for autonomous transportation and accident detection. We also examined data sets utilized in the AR tasks, identifying the main sources of datasets and features of the datasets. This paper provides potential research direction to develop and integrate accident detection systems for autonomous cars and public traffic safety systems by alerting emergency personnel and law enforcement in the event of road accidents to minimize human error in accident reporting and provide a spontaneous response to victims
Abstract:Hybrid LSTM-fully convolutional networks (LSTM-FCN) for time series classification have produced state-of-the-art classification results on univariate time series. We show that replacing the LSTM with a gated recurrent unit (GRU) to create a GRU-fully convolutional network hybrid model (GRU-FCN) can offer even better performance on many time series datasets. The proposed GRU-FCN model outperforms state-of-the-art classification performance in many univariate and multivariate time series datasets. In addition, since the GRU uses a simpler architecture than the LSTM, it has fewer training parameters, less training time, and a simpler hardware implementation, compared to the LSTM-based models.
Abstract:Spatiotemporal sequence prediction is an important problem in deep learning. We study next-frame(s) video prediction using a deep-learning-based predictive coding framework that uses convolutional, long short-term memory (convLSTM) modules. We introduce a novel reduced-gate convolutional LSTM (rgcLSTM) architecture that requires a significantly lower parameter budget than a comparable convLSTM. Our reduced-gate model achieves equal or better next-frame(s) prediction accuracy than the original convolutional LSTM while using a smaller parameter budget, thereby reducing training time. We tested our reduced gate modules within a predictive coding architecture on the moving MNIST and KITTI datasets. We found that our reduced-gate model has a significant reduction of approximately 40 percent of the total number of training parameters and a 25 percent redution in elapsed training time in comparison with the standard convolutional LSTM model. This makes our model more attractive for hardware implementation especially on small devices.