Abstract:In recent years, artificial intelligence (AI) has advanced significantly in speech recognition applications. Speech-based interaction with digital systems, particularly AI-driven digit recognition, has emerged as a prominent application. However, existing neural network-based methods often neglect the impact of noise, leading to reduced accuracy in noisy environments. This study tackles the challenge of recognizing the isolated spoken Persian numbers (zero to nine), particularly distinguishing phonetically similar numbers, in noisy environments. The proposed method, which is designed for speaker-independent recognition, combines residual convolutional neural network and bidirectional gated recurrent unit in a hybrid structure for Persian number recognition. This method employs word units as input instead of phoneme units. Audio data from 51 speakers of FARSDIGIT1 database are utilized after augmentation using various noises, and the Mel-Frequency Cepstral Coefficients (MFCC) technique is employed for feature extraction. The experimental results show the proposed method efficacy with 98.53%, 96.10%, and 95.9% recognition accuracy for training, validation, and test, respectively. In the noisy environment, the proposed method exhibits an average performance improvement of 26.88% over phoneme unit-based LSTM method for Persian numbers. In addition, the accuracy of the proposed method is 7.61% better than that of the Mel-scale Two Dimension Root Cepstrum Coefficients (MTDRCC) feature extraction technique along with MLP model in the test data for the same dataset.
Abstract:This article addresses the gap in computational painting analysis focused on Persian miniature painting, a rich cultural and artistic heritage. It introduces a novel approach using Convolutional Neural Networks (CNN) to classify Persian miniatures from five schools: Herat, Tabriz-e Avval, Shiraz-e Avval, Tabriz-e Dovvom, and Qajar. The method achieves an average accuracy of over 91%. A meticulously curated dataset captures the distinct features of each school, with a patch-based CNN approach classifying image segments independently before merging results for enhanced accuracy. This research contributes significantly to digital art analysis, providing detailed insights into the dataset, CNN architecture, training, and validation processes. It highlights the potential for future advancements in automated art analysis, bridging machine learning, art history, and digital humanities, thereby aiding the preservation and understanding of Persian cultural heritage.
Abstract:Today, skin cancer is considered as one of the most dangerous and common cancers in the world which demands special attention. Skin cancer may be developed in different types; including melanoma, actinic keratosis, basal cell carcinoma, squamous cell carcinoma, and Merkel cell carcinoma. Among them, melanoma is more unpredictable. Melanoma cancer can be diagnosed at early stages increasing the possibility of disease treatment. Automatic classification of skin lesions is a challenging task due to diverse forms and grades of the disease, demanding the requirement of novel methods implementation. Deep convolution neural networks (CNN) have shown an excellent potential for data and image classification. In this article, we inspect skin lesion classification problem using CNN techniques. Remarkably, we present that prominent classification accuracy of lesion detection can be obtained by proper designing and applying of transfer learning framework on pre-trained neural networks, without any requirement for data enlargement procedures i.e. merging VGG16 and VGG19 architectures pre-trained by a generic dataset with modified AlexNet network, and then, fine-tuned by a subject-specific dataset containing dermatology images. The convolution neural network was trained using 2541 images and, in particular, dropout was used to prevent the network from overfitting. Finally, the validity of the model was checked by applying the K-fold cross validation method. The proposed model increased classification accuracy by 3% (from 94.2% to 98.18%) in comparison with other methods.
Abstract:Convolutional neural networks (CNNs) and their variations have shown effectiveness in facial expression recognition (FER). However, they face challenges when dealing with high computational complexity and multi-view head poses in real-world scenarios. We introduce a lightweight attentional network incorporating multi-scale feature fusion (LANMSFF) to tackle these issues. For the first challenge, we have carefully designed a lightweight fully convolutional network (FCN). We address the second challenge by presenting two novel components, namely mass attention (MassAtt) and point wise feature selection (PWFS) blocks. The MassAtt block simultaneously generates channel and spatial attention maps to recalibrate feature maps by emphasizing important features while suppressing irrelevant ones. On the other hand, the PWFS block employs a feature selection mechanism that discards less meaningful features prior to the fusion process. This mechanism distinguishes it from previous methods that directly fuse multi-scale features. Our proposed approach achieved results comparable to state-of-the-art methods in terms of parameter counts and robustness to pose variation, with accuracy rates of 90.77% on KDEF, 70.44% on FER-2013, and 86.96% on FERPlus datasets. The code for LANMSFF is available at https://github.com/AE-1129/LANMSFF.
Abstract:This paper presents a novel Sequence-to-Sequence (Seq2Seq) model based on a transformer-based attention mechanism and temporal pooling for Non-Intrusive Load Monitoring (NILM) of smart buildings. The paper aims to improve the accuracy of NILM by using a deep learning-based method. The proposed method uses a Seq2Seq model with a transformer-based attention mechanism to capture the long-term dependencies of NILM data. Additionally, temporal pooling is used to improve the model's accuracy by capturing both the steady-state and transient behavior of appliances. The paper evaluates the proposed method on a publicly available dataset and compares the results with other state-of-the-art NILM techniques. The results demonstrate that the proposed method outperforms the existing methods in terms of both accuracy and computational efficiency.
Abstract:Demand-side management now encompasses more residential loads. To efficiently apply demand response strategies, it's essential to periodically observe the contribution of various domestic appliances to total energy consumption. Non-intrusive load monitoring (NILM), also known as load disaggregation, is a method for decomposing the total energy consumption profile into individual appliance load profiles within the household. It has multiple applications in demand-side management, energy consumption monitoring, and analysis. Various methods, including machine learning and deep learning, have been used to implement and improve NILM algorithms. This paper reviews some recent NILM methods based on deep learning and introduces the most accurate methods for residential loads. It summarizes public databases for NILM evaluation and compares methods using standard performance metrics.
Abstract:The participation of consumers and producers in demand response programs has increased in smart grids, which reduces investment and operation costs of power systems. Also, with the advent of renewable energy sources, the electricity market is becoming more complex and unpredictable. To effectively implement demand response programs, forecasting the future price of electricity is very crucial for producers in the electricity market. Electricity prices are very volatile and change under the influence of various factors such as temperature, wind speed, rainfall, intensity of commercial and daily activities, etc. Therefore, considering the influencing factors as dependent variables can increase the accuracy of the forecast. In this paper, a model for electricity price forecasting is presented based on Gated Recurrent Units. The electrical load consumption is considered as an input variable in this model. Noise in electricity price seriously reduces the efficiency and effectiveness of analysis. Therefore, an adaptive noise reducer is integrated into the model for noise reduction. The SAEs are then used to extract features from the de-noised electricity price. Finally, the de-noised features are fed into the GRU to train predictor. Results on real dataset shows that the proposed methodology can perform effectively in prediction of electricity price.
Abstract:Spectral unmixing (SU) of hyperspectral images (HSIs) is one of the important areas in remote sensing (RS) that needs to be carefully addressed in different RS applications. Despite the high spectral resolution of the hyperspectral data, the relatively low spatial resolution of the sensors may lead to mixture of different pure materials within the image pixels. In this case, the spectrum of a given pixel recorded by the sensor can be a combination of multiple spectra each belonging to a unique material in that pixel. Spectral unmixing is then used as a technique to extract the spectral characteristics of the different materials within the mixed pixels and to recover the spectrum of each pure spectral signature, called endmember. Block-sparsity exists in hyperspectral images as a result of spectral similarity between neighboring pixels. In block-sparse signals, the nonzero samples occur in clusters and the pattern of the clusters is often supposed to be unavailable as prior information. This paper presents an innovative spectral unmixing approach for HSIs based on block-sparse structure and sparse Bayesian learning (SBL) strategy. To evaluate the performance of the proposed SU algorithm, it is tested on both synthetic and real hyperspectral data and the quantitative results are compared to those of other state-of-the-art methods in terms of abundance angel distance (AAD) and mean square error (MSE). The achieved results show the superiority of the proposed algorithm over the other competing methods by a significant margin.
Abstract:License plate recognition systems have a very important role in many applications such as toll management, parking control, and traffic management. In this paper, a framework of deep convolutional neural networks is proposed for Iranian license plate recognition. The first CNN is the YOLOv3 network that detects the Iranian license plate in the input image while the second CNN is a Faster R-CNN that recognizes and classifies the characters in the detected license plate. A dataset of Iranian license plates consisting of ill-conditioned images also developed in this paper. The YOLOv3 network achieved 99.6% mAP, 98.26% recall, 98.08% accuracy, and average detection speed is only 23ms. Also, the Faster R-CNN network trained and tested on the developed dataset and achieved 98.97% recall, 99.9% precision, and 98.8% accuracy. The proposed system can recognize the license plate in challenging situations like unwanted data on the license plate. Comparing this system with other Iranian license plate recognition systems shows that it is Faster, more accurate and also this system can work in an open environment.
Abstract:Integration of renewable energy sources and emerging loads like electric vehicles to smart grids brings more uncertainty to the distribution system management. Demand Side Management (DSM) is one of the approaches to reduce the uncertainty. Some applications like Nonintrusive Load Monitoring (NILM) can support DSM, however they require accurate forecasting on high resolution data. This is challenging when it comes to single loads like one residential household due to its high volatility. In this paper, we review some of the existing Deep Learning-based methods and present our solution using Time Pooling Deep Recurrent Neural Network. The proposed method augments data using time pooling strategy and can overcome overfitting problems and model uncertainties of data more efficiently. Simulation and implementation results show that our method outperforms the existing algorithms in terms of RMSE and MAE metrics.