Abstract:In 2012, SEC mandated all corporate filings for any company doing business in US be entered into the Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system. In this work we are investigating ways to analyze the data available through EDGAR database. This may serve portfolio managers (pension funds, mutual funds, insurance, hedge funds) to get automated insights into companies they invest in, to better manage their portfolios. The analysis is based on Artificial Neural Networks applied to the data.} In particular, one of the most popular machine learning methods, the Convolutional Neural Network (CNN) architecture, originally developed to interpret and classify images, is now being used to interpret financial data. This work investigates the best way to input data collected from the SEC filings into a CNN architecture. We incorporate accounting principles and mathematical methods into the design of three image encoding methods. Specifically, two methods are derived from accounting principles (Sequential Arrangement, Category Chunk Arrangement) and one is using a purely mathematical technique (Hilbert Vector Arrangement). In this work we analyze fundamental financial data as well as financial ratio data and study companies from the financial, healthcare and IT sectors in the United States. We find that using imaging techniques to input data for CNN works better for financial ratio data but is not significantly better than simply using the 1D input directly for fundamental data. We do not find the Hilbert Vector Arrangement technique to be significantly better than other imaging techniques.
Abstract:Credit ratings are one of the primary keys that reflect the level of riskiness and reliability of corporations to meet their financial obligations. Rating agencies tend to take extended periods of time to provide new ratings and update older ones. Therefore, credit scoring assessments using artificial intelligence has gained a lot of interest in recent years. Successful machine learning methods can provide rapid analysis of credit scores while updating older ones on a daily time scale. Related studies have shown that neural networks and support vector machines outperform other techniques by providing better prediction accuracy. The purpose of this paper is two fold. First, we provide a survey and a comparative analysis of results from literature applying machine learning techniques to predict credit rating. Second, we apply ourselves four machine learning techniques deemed useful from previous studies (Bagged Decision Trees, Random Forest, Support Vector Machine and Multilayer Perceptron) to the same datasets. We evaluate the results using a 10-fold cross validation technique. The results of the experiment for the datasets chosen show superior performance for decision tree based models. In addition to the conventional accuracy measure of classifiers, we introduce a measure of accuracy based on notches called "Notch Distance" to analyze the performance of the above classifiers in the specific context of credit rating. This measure tells us how far the predictions are from the true ratings. We further compare the performance of three major rating agencies, Standard $\&$ Poors, Moody's and Fitch where we show that the difference in their ratings is comparable with the decision tree prediction versus the actual rating on the test dataset.