Abstract:In the last two decades Computer Aided Diagnostics (CAD) systems were developed to help radiologists analyze screening mammograms. The benefits of current CAD technologies appear to be contradictory and they should be improved to be ultimately considered useful. Since 2012 deep convolutional neural networks (CNN) have been a tremendous success in image recognition, reaching human performance. These methods have greatly surpassed the traditional approaches, which are similar to currently used CAD solutions. Deep CNN-s have the potential to revolutionize medical image analysis. We propose a CAD system based on one of the most successful object detection frameworks, Faster R-CNN. The system detects and classifies malignant or benign lesions on a mammogram without any human intervention. The proposed method sets the state of the art classification performance on the public INbreast database, AUC = 0.95 . The approach described here has achieved the 2nd place in the Digital Mammography DREAM Challenge with AUC = 0.85 . When used as a detector, the system reaches high sensitivity with very few false positive marks per image on the INbreast dataset. Source code, the trained model and an OsiriX plugin are availaible online at https://github.com/riblidezso/frcnn_cad .
Abstract:Principal component analysis (PCA) and related techniques have been successfully employed in natural language processing. Text mining applications in the age of the online social media (OSM) face new challenges due to properties specific to these use cases (e.g. spelling issues specific to texts posted by users, the presence of spammers and bots, service announcements, etc.). In this paper, we employ a Robust PCA technique to separate typical outliers and highly localized topics from the low-dimensional structure present in language use in online social networks. Our focus is on identifying geospatial features among the messages posted by the users of the Twitter microblogging service. Using a dataset which consists of over 200 million geolocated tweets collected over the course of a year, we investigate whether the information present in word usage frequencies can be used to identify regional features of language use and topics of interest. Using the PCA pursuit method, we are able to identify important low-dimensional features, which constitute smoothly varying functions of the geographic location.