Abstract:The Efficient Market Hypothesis has been a staple of economics research for decades. In particular, weak-form market efficiency -- the notion that past prices cannot predict future performance -- is strongly supported by econometric evidence. In contrast, machine learning algorithms implemented to predict stock price have been touted, to varying degrees, as successful. Moreover, some data scientists boast the ability to garner above-market returns using price data alone. This study endeavors to connect existing econometric research on weak-form efficient markets with data science innovations in algorithmic trading. First, a traditional exploration of stationarity in stock index prices over the past decade is conducted with Augmented Dickey-Fuller and Variance Ratio tests. Then, an algorithmic trading platform is implemented with the use of five machine learning algorithms. Econometric findings identify potential stationarity, hinting technical evaluation may be possible, though algorithmic trading results find little predictive power in any machine learning model, even when using trend-specific metrics. Accounting for transaction costs and risk, no system achieved above-market returns consistently. Our findings reinforce the validity of weak-form market efficiency.
Abstract:Machine learning has automated much of financial fraud detection, notifying firms of, or even blocking, questionable transactions instantly. However, data imbalance starves traditionally trained models of the content necessary to detect fraud. This study examines three separate factors of credit card fraud detection via machine learning. First, it assesses the potential for different sampling methods, undersampling and Synthetic Minority Oversampling Technique (SMOTE), to improve algorithm performance in data-starved environments. Additionally, five industry-practical machine learning algorithms are evaluated on total fraud cost savings in addition to traditional statistical metrics. Finally, an ensemble of individual models is trained with a genetic algorithm to attempt to generate higher cost efficiency than its components. Monte Carlo performance distributions discerned random undersampling outperformed SMOTE in lowering fraud costs, and that an ensemble was unable to outperform its individual parts. Most notably,the F-1 Score, a traditional metric often used to measure performance with imbalanced data, was uncorrelated with derived cost efficiency. Assuming a realistic cost structure can be derived, cost-based metrics provide an essential supplement to objective statistical evaluation.