Abstract:This study explored the use of machine learning algorithms for predicting and imputing missing values in categorical datasets. We focused on ensemble models that use the error correction output codes (ECOC) framework, including SVM-based and KNN-based ensemble models, as well as an ensemble classifier that combines SVM, KNN, and MLP models. We applied these algorithms to three datasets: the CPU dataset, the hypothyroid dataset, and the Breast Cancer dataset. Our experiments showed that the machine learning algorithms were able to achieve good performance in predicting and imputing the missing values, with some variations depending on the specific dataset and missing value pattern. The ensemble models using the error correction output codes (ECOC) framework were particularly effective in improving the accuracy and robustness of the predictions, compared to individual models. However, there are also challenges and limitations to using deep learning for missing value imputation, including the need for large amounts of labeled data and the potential for overfitting. Further research is needed to evaluate the effectiveness and efficiency of deep learning algorithms for missing value imputation and to develop strategies for addressing the challenges and limitations that may arise.
Abstract:Privacy is a major issue in learning from distributed data. Recently the cryptographic literature has provided several tools for this task. However, these tools either reduce the quality/accuracy of the learning algorithm---e.g., by adding noise---or they incur a high performance penalty and/or involve trusting external authorities. We propose a methodology for {\sl private distributed machine learning from light-weight cryptography} (in short, PD-ML-Lite). We apply our methodology to two major ML algorithms, namely non-negative matrix factorization (NMF) and singular value decomposition (SVD). Our resulting protocols are communication optimal, achieve the same accuracy as their non-private counterparts, and satisfy a notion of privacy---which we define---that is both intuitive and measurable. Our approach is to use lightweight cryptographic protocols (secure sum and normalized secure sum) to build learning algorithms rather than wrap complex learning algorithms in a heavy-cost MPC framework. We showcase our algorithms' utility and privacy on several applications: for NMF we consider topic modeling and recommender systems, and for SVD, principal component regression, and low rank approximation.
Abstract:On dedicated websites, people can upload videos and share it with the rest of the world. Currently these videos are cat- egorized manually by the help of the user community. In this paper, we propose a combination of color spaces with the Bayesian network approach for robust detection of skin color followed by an automated video categorization. Exper- imental results show that our method can achieve satisfactory performance for categorizing videos based on skin color.