Abstract:This manuscript introduces the hype-adjusted probability measure developed in the context of a new Natural Language Processing (NLP) approach for market forecasting. A novel sentiment score equation is presented to capture component and memory effects and assign dynamic parameters, enhancing the impact of intraday news data on forecasting next-period volatility for selected U.S. semiconductor stocks. This approach integrates machine learning techniques to analyze and improve the predictive value of news. Building on the research of Geman's, this work improves forecast accuracy by assigning specific weights to each component of news sources and individual stocks in the portfolio, evaluating time-memory effects on market reactions, and incorporating shifts in sentiment direction. Finally, we propose the Hype-Adjusted Probability Measure, proving its existence and uniqueness, and discuss its theoretical applications in finance for NLP-based volatility forecasting, outlining future research pathways inspired by its concepts.
Abstract:Prediction and quantification of future volatility and returns play an important role in financial modelling, both in portfolio optimization and risk management. Natural language processing today allows to process news and social media comments to detect signals of investors' confidence. We have explored the relationship between sentiment extracted from financial news and tweets and FTSE100 movements. We investigated the strength of the correlation between sentiment measures on a given day and market volatility and returns observed the next day. The findings suggest that there is evidence of correlation between sentiment and stock market movements: the sentiment captured from news headlines could be used as a signal to predict market returns; the same does not apply for volatility. Also, in a surprising finding, for the sentiment found in Twitter comments we obtained a correlation coefficient of -0.7, and p-value below 0.05, which indicates a strong negative correlation between positive sentiment captured from the tweets on a given day and the volatility observed the next day. We developed an accurate classifier for the prediction of market volatility in response to the arrival of new information by deploying topic modelling, based on Latent Dirichlet Allocation, to extract feature vectors from a collection of tweets and financial news. The obtained features were used as additional input to the classifier. Thanks to the combination of sentiment and topic modelling our classifier achieved a directional prediction accuracy for volatility of 63%.