Abstract:Data is the key asset for organizations and data sharing is lifeline for organization growth; which may lead to data loss. Data leakage is the most critical issue being faced by organizations. In order to mitigate the data leakage issues data leakage prevention systems (DLPSs) are deployed at various levels by the organizations. DLPSs are capable to protect all kind of data i.e. DAR, DIM/DIT, DIU. Statistical analysis, regular expression, data fingerprinting are common approaches exercised in DLP system. Out of these techniques; statistical analysis approach is most appropriate for proposed DLP model of data security. This paper defines a statistical DLP model for document classification. Model uses various statistical approaches like TF-IDF (Term Frequency- Inverse Document Frequency) a renowned term count/weighing function, Vectorization, Gradient boosting document classification etc. to classify the documents before allowing any access to it. Machine learning is used to test and train the model. Proposed model also introduces an extremely efficient and more accurate approach; IGBCA (Improvised Gradient Boosting Classification Algorithm); for document classification, to prevent them from possible data leakage. Results depicts that proposed model can classify documents with high accuracy and on basis of which data can be prevented from being loss.
Abstract:Sensitive data leakage is the major growing problem being faced by enterprises in this technical era. Data leakage causes severe threats for organization of data safety which badly affects the reputation of organizations. Data leakage is the flow of sensitive data/information from any data holder to an unauthorized destination. Data leak prevention (DLP) is set of techniques that try to alleviate the threats which may hinder data security. DLP unveils guilty user responsible for data leakage and ensures that user without appropriate permission cannot access sensitive data and also provides protection to sensitive data if sensitive data is shared accidentally. In this paper, data leakage prevention (DLP) model is used to restrict/grant data access permission to user, based on the forecast of their access to data. This study provides a DLP solution using data statistical analysis to forecast the data access possibilities of any user in future based on the access to data in the past. The proposed approach makes use of renowned simple piecewise linear function for learning/training to model. The results show that the proposed DLP approach with high level of precision can correctly classify between users even in cases of extreme data access.