Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alireza Hashemi

Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics

Apr 01, 2025

Hamed Mahdavi, Alireza Hashemi, Majid Daliri, Pegah Mohammadipour, Alireza Farhadi, Samira Malek, Yekta Yazdanifard, Amir Khasahmadi, Vasant Honavar

Abstract:Recent advancements in large language models (LLMs) have shown impressive progress in mathematical reasoning tasks. However, current evaluation benchmarks predominantly focus on the accuracy of final answers, often overlooking the logical rigor crucial for mathematical problem-solving. The claim that state-of-the-art LLMs can solve Math Olympiad-level problems requires closer examination. To explore this, we conducted both qualitative and quantitative human evaluations of proofs generated by LLMs, and developed a schema for automatically assessing their reasoning capabilities. Our study reveals that current LLMs fall significantly short of solving challenging Olympiad-level problems and frequently fail to distinguish correct mathematical reasoning from clearly flawed solutions. We also found that occasional correct final answers provided by LLMs often result from pattern recognition or heuristic shortcuts rather than genuine mathematical reasoning. These findings underscore the substantial gap between LLM performance and human expertise in advanced mathematical reasoning and highlight the importance of developing benchmarks that prioritize the rigor and coherence of mathematical arguments rather than merely the correctness of final answers.

Via

Access Paper or Ask Questions

Generalizable Error Modeling for Search Relevance Data Annotation Tasks

Oct 08, 2023

Heinrich Peters, Alireza Hashemi, James Rae

Abstract:Human data annotation is critical in shaping the quality of machine learning (ML) and artificial intelligence (AI) systems. One significant challenge in this context is posed by annotation errors, as their effects can degrade the performance of ML models. This paper presents a predictive error model trained to detect potential errors in search relevance annotation tasks for three industry-scale ML applications (music streaming, video streaming, and mobile apps) and assesses its potential to enhance the quality and efficiency of the data annotation process. Drawing on real-world data from an extensive search relevance annotation program, we illustrate that errors can be predicted with moderate model performance (AUC=0.65-0.75) and that model performance generalizes well across applications (i.e., a global, task-agnostic model performs on par with task-specific models). We present model explainability analyses to identify which types of features are the main drivers of predictive performance. Additionally, we demonstrate the usefulness of the model in the context of auditing, where prioritizing tasks with high predicted error probabilities considerably increases the amount of corrected annotation errors (e.g., 40% efficiency gains for the music streaming application). These results underscore that automated error detection models can yield considerable improvements in the efficiency and quality of data annotation processes. Thus, our findings reveal critical insights into effective error management in the data annotation process, thereby contributing to the broader field of human-in-the-loop ML.

Via

Access Paper or Ask Questions

Visiting Distant Neighbors in Graph Convolutional Networks

Jan 29, 2023

Alireza Hashemi, Hernan Makse

Abstract:We extend the graph convolutional network method for deep learning on graph data to higher order in terms of neighboring nodes. In order to construct representations for a node in a graph, in addition to the features of the node and its immediate neighboring nodes, we also include more distant nodes in the calculations. In experimenting with a number of publicly available citation graph datasets, we show that this higher order neighbor visiting pays off by outperforming the original model especially when we have a limited number of available labeled data points for the training of the model.

Via

Access Paper or Ask Questions

A transfer learning metamodel using artificial neural networks applied to natural convection flows in enclosures

Aug 28, 2020

Majid Ashouri, Alireza Hashemi

Figure 1 for A transfer learning metamodel using artificial neural networks applied to natural convection flows in enclosures

Figure 2 for A transfer learning metamodel using artificial neural networks applied to natural convection flows in enclosures

Figure 3 for A transfer learning metamodel using artificial neural networks applied to natural convection flows in enclosures

Figure 4 for A transfer learning metamodel using artificial neural networks applied to natural convection flows in enclosures

Abstract:In this paper, we employed a transfer learning technique to predict the Nusselt number for natural convection flows in enclosures. Specifically, we numerically simulated a benchmark problem in square enclosures described by the Rayleigh and Prandtl numbers using the finite volume method. Given that the ideal grid size depends on the value of these parameters, we performed our simulations using a combination of different grid systems. This allowed us to train an artificial neural network in a cost-effective manner. We adopted two approaches to this problem. First, we generated a multi-grid training dataset that included both the Rayleigh and Prandtl numbers as input variables. By monitoring the training losses for this dataset, we were able to detect any significant anomalies that stemmed from an insufficient grid size. We then revised the grid size or added more data points to denoise the dataset and transferred the learning from our original dataset to build a computational metamodel that predicts the Nusselt number. Furthermore, we sought to endow our neural network model with the ability to account for additional input features. Therefore, in our second approach, we applied a deep neural network architecture for transfer learning to this problem. Initially, we trained a neural network with a single input feature (Rayleigh), and then, extended the network to incorporate the effects of a second feature (Prandtl). This learning framework can be applied to other systems of natural convection in enclosures that presumably have higher physical complexity, while bringing the computational and training costs down.

Via

Access Paper or Ask Questions