Abstract:The CEEMDAN algorithm is one of the modern methods used in the analysis of non-stationary signals. This research presents a study of the effectiveness of this method in audio source separation to know the limits of its work. It concluded two conditions related to frequencies and amplitudes of mixed signals to be separated by CEEMDAN. The performance of the algorithm in separating noise from speech and separating speech signals from each other is studied. The research reached a conclusion that CEEMDAN can remove some types of noise from speech (speech improvement), and it cannot separate speech signals from each other (cocktail party). Simulation is done using Matlab environment and Noizeus database.
Abstract:This paper addresses the challenge of speaker separation, which remains an active research topic despite the promising results achieved in recent years. These results, however, often degrade in real recording conditions due to the presence of noise, echo, and other interferences. This is because neural models are typically trained on synthetic datasets consisting of mixed audio signals and their corresponding ground truths, which are generated using computer software and do not fully represent the complexities of real-world recording scenarios. The lack of realistic training sets for speaker separation remains a major hurdle, as obtaining individual sounds from mixed audio signals is a nontrivial task. To address this issue, we propose a novel method for constructing a realistic training set that includes mixture signals and corresponding ground truths for each speaker. We evaluate this dataset on a deep learning model and compare it to a synthetic dataset. We got a 1.65 dB improvement in Scale Invariant Signal to Distortion Ratio (SI-SDR) for speaker separation accuracy in realistic mixing. Our findings highlight the potential of realistic training sets for enhancing the performance of speaker separation models in real-world scenarios.
Abstract:Deep Attractor Network (DANet) is the state-of-the-art technique in speech separation field, which uses Bidirectional Long Short-Term Memory (BLSTM), but the complexity of the DANet model is very high. In this paper, a simplified and powerful DANet model is proposed using Bidirectional Gated neural network (BGRU) instead of BLSTM. The Gaussian Mixture Model (GMM) other than the k-means was applied in DANet as a clustering algorithm to reduce the complexity and increase the learning speed and accuracy. The metrics used in this paper are Signal to Distortion Ratio (SDR), Signal to Interference Ratio (SIR), Signal to Artifact Ratio (SAR), and Perceptual Evaluation Speech Quality (PESQ) score. Two speaker mixture datasets from TIMIT corpus were prepared to evaluate the proposed model, and the system achieved 12.3 dB and 2.94 for SDR and PESQ scores respectively, which were better than the original DANet model. Other improvements were 20.7% and 17.9% in the number of parameters and time training, respectively. The model was applied on mixed Arabic speech signals and the results were better than that in English.
Abstract:Speech separation is very important in real-world applications such as human-machine interaction, hearing aids devices, and automatic meeting transcription. In recent years, a significant improvement occurred towards the solution based on deep learning. In fact, much attention has been drawn to supervised learning methods using synthetic mixtures datasets despite their being not representative of real-world mixtures. The difficulty in building a realistic dataset led researchers to use unsupervised learning methods, because of their ability to handle realistic mixtures directly. The results of unsupervised learning methods are still unconvincing. In this paper, a method is introduced to create a realistic dataset with ground truth sources for speech separation. The main challenge in designing a realistic dataset is the unavailability of ground truths for speakers signals. To address this, we propose a method for simultaneously recording two speakers and obtaining the ground truth for each. We present a methodology for benchmarking our realistic dataset using a deep learning model based on Bidirectional Gated Recurrent Units (BGRU) and clustering algorithm. The experiments show that our proposed dataset improved SI-SDR (Scale Invariant Signal to Distortion Ratio) by 1.65 dB and PESQ (Perceptual Evaluation of Speech Quality) by approximately 0.5. We also evaluated the effectiveness of our method at different distances between the microphone and the speakers and found that it improved the stability of the learned model.
Abstract:Customer churn is a major problem and one of the most important concerns for large companies. Due to the direct effect on the revenues of the companies, especially in the telecom field, companies are seeking to develop means to predict potential customer to churn. Therefore, finding factors that increase customer churn is important to take necessary actions to reduce this churn. The main contribution of our work is to develop a churn prediction model which assists telecom operators to predict customers who are most likely subject to churn. The model developed in this work uses machine learning techniques on big data platform and builds a new way of features' engineering and selection. In order to measure the performance of the model, the Area Under Curve (AUC) standard measure is adopted, and the AUC value obtained is 93.3%. Another main contribution is to use customer social network in the prediction model by extracting Social Network Analysis (SNA) features. The use of SNA enhanced the performance of the model from 84 to 93.3% against AUC standard. The model was prepared and tested through Spark environment by working on a large dataset created by transforming big raw data provided by SyriaTel telecom company. The dataset contained all customers' information over 9 months, and was used to train, test, and evaluate the system at SyriaTel. The model experimented four algorithms: Decision Tree, Random Forest, Gradient Boosted Machine Tree "GBM" and Extreme Gradient Boosting "XGBOOST". However, the best results were obtained by applying XGBOOST algorithm. This algorithm was used for classification in this churn predictive model.
Abstract:In the age of data driven solution, the customer demographic attributes, such as gender and age, play a core role that may enable companies to enhance the offers of their services and target the right customer in the right time and place. In the marketing campaign, the companies want to target the real user of the GSM (global system for mobile communications), not the line owner. Where sometimes they may not be the same. This work proposes a method that predicts users' gender and age based on their behavior, services and contract information. We used call detail records (CDRs), customer relationship management (CRM) and billing information as a data source to analyze telecom customer behavior, and applied different types of machine learning algorithms to provide marketing campaigns with more accurate information about customer demographic attributes. This model is built using reliable data set of 18,000 users provided by SyriaTel Telecom Company, for training and testing. The model applied by using big data technology and achieved 85.6% accuracy in terms of user gender prediction and 65.5% of user age prediction. The main contribution of this work is the improvement in the accuracy in terms of user gender prediction and user age prediction based on mobile phone data and end-to-end solution that approaches customer data from multiple aspects in the telecom domain.