Abstract:Understanding customer behavior is fundamental for many use-cases in industry, especially in accelerated growth areas such as fin-tech and e-commerce. Structured data are often expensive, time-consuming and inadequate to analyze and study complex customer behaviors. In this paper, we propose a multi-graph embedding approach for creating a non-linear representation of customers in order to have a better knowledge of their characteristics without having any prior information about their financial status or their interests. By applying the current method we are able to predict users' future behavior with a reasonably high accuracy only by having the information of their friendship network. Potential applications include recommendation systems and credit risk forecasting.
Abstract:A pharmacological effect of a drug on cells, organs and systems refers to the specific biochemical interaction produced by a drug substance, which is called its mechanism of action. Drug repositioning (or drug repurposing) is a fundamental problem for the identification of new opportunities for the use of already approved or failed drugs. In this paper, we present a method based on a multi-relation unsupervised graph embedding model that learns latent representations for drugs and diseases so that the distance between these representations reveals repositioning opportunities. Once representations for drugs and diseases are obtained we learn the likelihood of new links (that is, new indications) between drugs and diseases. Known drug indications are used for learning a model that predicts potential indications. Compared with existing unsupervised graph embedding methods our method shows superior prediction performance in terms of area under the ROC curve, and we present examples of repositioning opportunities found on recent biomedical literature that were also predicted by our method.
Abstract:Keyword extraction has received an increasing attention as an important research topic which can lead to have advancements in diverse applications such as document context categorization, text indexing and document classification. In this paper we propose STF-IDF, a novel semantic method based on TF-IDF, for scoring word importance of informal documents in a corpus. A set of nearly four million documents from health-care social media was collected and was trained in order to draw semantic model and to find the word embeddings. Then, the features of semantic space were utilized to rearrange the original TF-IDF scores through an iterative solution so as to improve the moderate performance of this algorithm on informal texts. After testing the proposed method with 200 randomly chosen documents, our method managed to decrease the TF-IDF mean error rate by a factor of 50% and reaching the mean error of 13.7%, as opposed to 27.2% of the original TF-IDF.
Abstract:Graphs are useful structures that can model several important real-world problems. Recently, learning graphs have drawn considerable attention, leading to the proposal of new methods for learning these data structures. One of these studies produced NetGAN, a new approach for generating graphs via random walks. Although NetGAN has shown promising results in terms of accuracy in the tasks of generating graphs and link prediction, the choice of vertices from which it starts random walks can lead to inconsistent and highly variable results, especially when the length of walks is short. As an alternative to random starting, this study aims to establish a new method for initializing random walks from a set of dense vertices. We purpose estimating the importance of a node based on the inverse of its influence over the whole vertices of its neighborhood through random walks of different sizes. The proposed method manages to achieve significantly better accuracy, less variance and lesser outliers.