Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mahmoud Mahfouz

Entropy-Aware Branching for Improved Mathematical Reasoning

Mar 27, 2025

Xianzhi Li, Ethan Callanan, Xiaodan Zhu, Mathieu Sibue, Antony Papadimitriou, Mahmoud Mahfouz, Zhiqiang Ma, Xiaomo Liu

Abstract:While Large Language Models (LLMs) are effectively aligned through extensive pre-training and fine-tuning, they still struggle with varying levels of uncertainty during token generation. In our investigation of mathematical reasoning, we observe that errors are more likely to arise at tokens exhibiting high entropy and variance of entropy in the model's output distribution. Based on the observation, we propose a novel approach that dynamically branches the generation process on demand instead of defaulting to the single most probable token. By exploring in parallel multiple branches stemming from high probability tokens of critical decision points, the model can discover diverse reasoning paths that might otherwise be missed. We further harness external feedback from larger models to rank and select the most coherent and accurate reasoning branch. Our experimental results on mathematical word problems and calculation questions show that this branching strategy boosts the reasoning capabilities of small LLMs up to 4.6% compared to conventional argmax decoding.

Via

Access Paper or Ask Questions

How Robust are Limit Order Book Representations under Data Perturbation?

Oct 10, 2021

Yufei Wu, Mahmoud Mahfouz, Daniele Magazzeni, Manuela Veloso

Figure 1 for How Robust are Limit Order Book Representations under Data Perturbation?

Figure 2 for How Robust are Limit Order Book Representations under Data Perturbation?

Figure 3 for How Robust are Limit Order Book Representations under Data Perturbation?

Figure 4 for How Robust are Limit Order Book Representations under Data Perturbation?

Abstract:The success of machine learning models in the financial domain is highly reliant on the quality of the data representation. In this paper, we focus on the representation of limit order book data and discuss the opportunities and challenges for learning representations of such data. We also experimentally analyse the issues associated with existing representations and present a guideline for future research in this area.

Via

Access Paper or Ask Questions

A Framework for Institutional Risk Identification using Knowledge Graphs and Automated News Profiling

Sep 19, 2021

Mahmoud Mahfouz, Armineh Nourbakhsh, Sameena Shah

Figure 1 for A Framework for Institutional Risk Identification using Knowledge Graphs and Automated News Profiling

Figure 2 for A Framework for Institutional Risk Identification using Knowledge Graphs and Automated News Profiling

Figure 3 for A Framework for Institutional Risk Identification using Knowledge Graphs and Automated News Profiling

Abstract:Organizations around the world face an array of risks impacting their operations globally. It is imperative to have a robust risk identification process to detect and evaluate the impact of potential risks before they materialize. Given the nature of the task and the current requirements of deep subject matter expertise, most organizations utilize a heavily manual process. In our work, we develop an automated system that (a) continuously monitors global news, (b) is able to autonomously identify and characterize risks, (c) is able to determine the proximity of reaching triggers to determine the distance from the manifestation of the risk impact and (d) identifies organization's operational areas that may be most impacted by the risk. Other contributions also include: (a) a knowledge graph representation of risks and (b) relevant news matching to risks identified by the organization utilizing a neural embedding model to match the textual description of a given risk with multi-lingual news.

Via

Access Paper or Ask Questions

Tucker Tensor Layer in Fully Connected Neural Networks

Mar 14, 2019

Giuseppe G. Calvi, Ahmad Moniri, Mahmoud Mahfouz, Zeyang Yu, Qibin Zhao, Danilo P. Mandic

Figure 1 for Tucker Tensor Layer in Fully Connected Neural Networks

Figure 2 for Tucker Tensor Layer in Fully Connected Neural Networks

Figure 3 for Tucker Tensor Layer in Fully Connected Neural Networks

Figure 4 for Tucker Tensor Layer in Fully Connected Neural Networks

Abstract:We introduce the Tucker Tensor Layer (TTL), an alternative to the dense weight-matrices of the fully connected layers of feed-forward neural networks (NNs), to answer the long standing quest to compress NNs and improve their interpretability. This is achieved by treating these weight-matrices as the unfolding of a higher order weight-tensor. This enables us to introduce a framework for exploiting the multi-way nature of the weight-tensor in order to efficiently reduce the number of parameters, by virtue of the compression properties of tensor decompositions. The Tucker Decomposition (TKD) is employed to decompose the weight-tensor into a core tensor and factor matrices. We re-derive back-propagation within this framework, by extending the notion of matrix derivatives to tensors. In this way, the physical interpretability of the TKD is exploited to gain insights into training, through the process of computing gradients with respect to each factor matrix. The proposed framework is validated on synthetic data and on the Fashion-MNIST dataset, emphasizing the relative importance of various data features in training, hence mitigating the "black-box" issue inherent to NNs. Experiments on both MNIST and Fashion-MNIST illustrate the compression properties of the TTL, achieving a 66.63 fold compression whilst maintaining comparable performance to the uncompressed NN.

Via

Access Paper or Ask Questions