Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Avinash Anand

Multilingual Mathematical Reasoning: Advancing Open-Source LLMs in Hindi and English

Dec 24, 2024

Avinash Anand, Kritarth Prasad, Chhavi Kirtani, Ashwin R Nair, Manvendra Kumar Nema, Raj Jaiswal, Rajiv Ratn Shah

Figure 1 for Multilingual Mathematical Reasoning: Advancing Open-Source LLMs in Hindi and English

Figure 2 for Multilingual Mathematical Reasoning: Advancing Open-Source LLMs in Hindi and English

Figure 3 for Multilingual Mathematical Reasoning: Advancing Open-Source LLMs in Hindi and English

Figure 4 for Multilingual Mathematical Reasoning: Advancing Open-Source LLMs in Hindi and English

Abstract:Large Language Models (LLMs) excel in linguistic tasks but struggle with mathematical reasoning, particularly in non English languages like Hindi. This research aims to enhance the mathematical reasoning skills of smaller, resource efficient open-source LLMs in both Hindi and English. We evaluate models like OpenHathi 7B, LLaMA-2 7B, WizardMath 7B, Mistral 7B, LLeMMa 7B, MAmmoTH 7B, Gemini Pro, and GPT-4 using zero-shot, few-shot chain-of-thought (CoT) methods, and supervised fine-tuning. Our approach incorporates curriculum learning, progressively training models on increasingly difficult problems, a novel Decomposition Strategy to simplify complex arithmetic operations, and a Structured Solution Design that divides solutions into phases. Our experiments result in notable performance enhancements. WizardMath 7B exceeds Gemini's accuracy on English datasets by +6% and matches Gemini's performance on Hindi datasets. Adopting a bilingual approach that combines English and Hindi samples achieves results comparable to individual language models, demonstrating the capability to learn mathematical reasoning in both languages. This research highlights the potential for improving mathematical reasoning in open-source LLMs.

* Accepted at AAAI 2025

Via

Access Paper or Ask Questions

Steps are all you need: Rethinking STEM Education with Prompt Engineering

Dec 06, 2024

Krishnasai Addala, Kabir Dev Paul Baghel, Chhavi Kirtani, Avinash Anand, Rajiv Ratn Shah

Abstract:Few shot and Chain-of-Thought prompting have shown promise when applied to Physics Question Answering Tasks, but are limited by the lack of mathematical ability inherent to LLMs, and are prone to hallucination. By utilizing a Mixture of Experts (MoE) Model, along with analogical prompting, we are able to show improved model performance when compared to the baseline on standard LLMs. We also survey the limits of these prompting techniques and the effects they have on model performance. Additionally, we propose Analogical CoT prompting, a prompting technique designed to allow smaller, open source models to leverage Analogical prompting, something they have struggled with, possibly due to a lack of specialist training data.

Via

Access Paper or Ask Questions

Knowledge Graphs are all you need: Leveraging KGs in Physics Question Answering

Dec 06, 2024

Krishnasai Addala, Kabir Dev Paul Baghel, Dhruv Jain, Chhavi Kirtani, Avinash Anand, Rajiv Ratn Shah

Figure 1 for Knowledge Graphs are all you need: Leveraging KGs in Physics Question Answering

Figure 2 for Knowledge Graphs are all you need: Leveraging KGs in Physics Question Answering

Figure 3 for Knowledge Graphs are all you need: Leveraging KGs in Physics Question Answering

Figure 4 for Knowledge Graphs are all you need: Leveraging KGs in Physics Question Answering

Abstract:This study explores the effectiveness of using knowledge graphs generated by large language models to decompose high school-level physics questions into sub-questions. We introduce a pipeline aimed at enhancing model response quality for Question Answering tasks. By employing LLMs to construct knowledge graphs that capture the internal logic of the questions, these graphs then guide the generation of subquestions. We hypothesize that this method yields sub-questions that are more logically consistent with the original questions compared to traditional decomposition techniques. Our results show that sub-questions derived from knowledge graphs exhibit significantly improved fidelity to the original question's logic. This approach not only enhances the learning experience by providing clearer and more contextually appropriate sub-questions but also highlights the potential of LLMs to transform educational methodologies. The findings indicate a promising direction for applying AI to improve the quality and effectiveness of educational content.

Via

Access Paper or Ask Questions

Enhancing LLMs for Physics Problem-Solving using Reinforcement Learning with Human-AI Feedback

Dec 06, 2024

Avinash Anand, Kritarth Prasad, Chhavi Kirtani, Ashwin R Nair, Mohit Gupta, Saloni Garg, Anurag Gautam, Snehal Buldeo, Rajiv Ratn Shah

Figure 1 for Enhancing LLMs for Physics Problem-Solving using Reinforcement Learning with Human-AI Feedback

Figure 2 for Enhancing LLMs for Physics Problem-Solving using Reinforcement Learning with Human-AI Feedback

Figure 3 for Enhancing LLMs for Physics Problem-Solving using Reinforcement Learning with Human-AI Feedback

Figure 4 for Enhancing LLMs for Physics Problem-Solving using Reinforcement Learning with Human-AI Feedback

Abstract:Large Language Models (LLMs) have demonstrated strong capabilities in text-based tasks but struggle with the complex reasoning required for physics problems, particularly in advanced arithmetic and conceptual understanding. While some research has explored ways to enhance LLMs in physics education using techniques such as prompt engineering and Retrieval Augmentation Generation (RAG), not enough effort has been made in addressing their limitations in physics reasoning. This paper presents a novel approach to improving LLM performance on physics questions using Reinforcement Learning with Human and Artificial Intelligence Feedback (RLHAIF). We evaluate several reinforcement learning methods, including Proximal Policy Optimization (PPO), Direct Preference Optimization (DPO), and Remax optimization. These methods are chosen to investigate RL policy performance with different settings on the PhyQA dataset, which includes challenging physics problems from high school textbooks. Our RLHAIF model, tested on leading LLMs like LLaMA2 and Mistral, achieved superior results, notably with the MISTRAL-PPO model, demonstrating marked improvements in reasoning and accuracy. It achieved high scores, with a 58.67 METEOR score and a 0.74 Reasoning score, making it a strong example for future physics reasoning research in this area.

Via

Access Paper or Ask Questions

Su-RoBERTa: A Semi-supervised Approach to Predicting Suicide Risk through Social Media using Base Language Models

Dec 02, 2024

Chayan Tank, Shaina Mehta, Sarthak Pol, Vinayak Katoch, Avinash Anand, Raj Jaiswal, Rajiv Ratn Shah

Figure 1 for Su-RoBERTa: A Semi-supervised Approach to Predicting Suicide Risk through Social Media using Base Language Models

Figure 2 for Su-RoBERTa: A Semi-supervised Approach to Predicting Suicide Risk through Social Media using Base Language Models

Figure 3 for Su-RoBERTa: A Semi-supervised Approach to Predicting Suicide Risk through Social Media using Base Language Models

Figure 4 for Su-RoBERTa: A Semi-supervised Approach to Predicting Suicide Risk through Social Media using Base Language Models

Abstract:In recent times, more and more people are posting about their mental states across various social media platforms. Leveraging this data, AI-based systems can be developed that help in assessing the mental health of individuals, such as suicide risk. This paper is a study done on suicidal risk assessments using Reddit data leveraging Base language models to identify patterns from social media posts. We have demonstrated that using smaller language models, i.e., less than 500M parameters, can also be effective in contrast to LLMs with greater than 500M parameters. We propose Su-RoBERTa, a fine-tuned RoBERTa on suicide risk prediction task that utilized both the labeled and unlabeled Reddit data and tackled class imbalance by data augmentation using GPT-2 model. Our Su-RoBERTa model attained a 69.84% weighted F1 score during the Final evaluation. This paper demonstrates the effectiveness of Base language models for the analysis of the risk factors related to mental health with an efficient computation pipeline

* 8 pages, 7 figures, Accepted at IEEE International Conference on Big Data (IEEE BigData 2024)

Via

Access Paper or Ask Questions

Improving Physics Reasoning in Large Language Models Using Mixture of Refinement Agents

Dec 01, 2024

Raj Jaiswal, Dhruv Jain, Harsh Parimal Popat, Avinash Anand, Abhishek Dharmadhikari, Atharva Marathe, Rajiv Ratn Shah

Figure 1 for Improving Physics Reasoning in Large Language Models Using Mixture of Refinement Agents

Figure 2 for Improving Physics Reasoning in Large Language Models Using Mixture of Refinement Agents

Figure 3 for Improving Physics Reasoning in Large Language Models Using Mixture of Refinement Agents

Figure 4 for Improving Physics Reasoning in Large Language Models Using Mixture of Refinement Agents

Abstract:Large Language Models (LLMs) demonstrate remarkable capabilities in various reasoning tasks. However, they encounter significant challenges when it comes to scientific reasoning, particularly in physics, which requires not only mathematical reasoning but also factual and conceptual understanding. When addressing complex physics problems, LLMs typically face three key issues: problem miscomprehension, incorrect concept application, and computational errors. While each of these problems can be addressed individually, there is a need for a generalized approach that can tackle all three issues simultaneously. To address this, we introduce Mixture of Refinement Agents (MoRA), a novel agentic refinement framework that iteratively refines the LLM generated base solution by correcting the aforementioned errors, resulting in a significant performance improvement for open-source LLMs. Our approach aims to bridge the gap between opensource LLMs and GPT-4o by utilizing the latter as error identifier to guide these refinement agents. We evaluate our approach on the SciEval and MMLU subsets along with our own physics dataset (PhysicsQA). MoRA significantly improves the performance of Llama-3-70B and Gemma-2-27B on these datasets, achieving up to a 16% increase in final answer accuracy.

* 7 pages

Via

Access Paper or Ask Questions

Improving Multimodal LLMs Ability In Geometry Problem Solving, Reasoning, And Multistep Scoring

Dec 01, 2024

Avinash Anand, Raj Jaiswal, Abhishek Dharmadhikari, Atharva Marathe, Harsh Parimal Popat, Harshil Mital, Kritarth Prasad, Rajiv Ratn Shah, Roger Zimmermann

Figure 1 for Improving Multimodal LLMs Ability In Geometry Problem Solving, Reasoning, And Multistep Scoring

Figure 2 for Improving Multimodal LLMs Ability In Geometry Problem Solving, Reasoning, And Multistep Scoring

Figure 3 for Improving Multimodal LLMs Ability In Geometry Problem Solving, Reasoning, And Multistep Scoring

Figure 4 for Improving Multimodal LLMs Ability In Geometry Problem Solving, Reasoning, And Multistep Scoring

Abstract:This paper presents GPSM4K, a comprehensive geometry multimodal dataset tailored to augment the problem-solving capabilities of Large Vision Language Models (LVLMs). GPSM4K encompasses 2157 multimodal question-answer pairs manually extracted from mathematics textbooks spanning grades 7-12 and is further augmented to 5340 problems, consisting of both numerical and theorem-proving questions. In contrast to PGPS9k, Geometry3K, and Geo170K which feature only objective-type questions, GPSM4K offers detailed step-by-step solutions in a consistent format, facilitating a comprehensive evaluation of problem-solving approaches. This dataset serves as an excellent benchmark for assessing the geometric reasoning capabilities of LVLMs. Evaluation of our test set shows that there is scope for improvement needed in open-source language models in geometry problem-solving. Finetuning on our training set increases the geometry problem-solving capabilities of models. Further, We also evaluate the effectiveness of techniques such as image captioning and Retrieval Augmentation generation (RAG) on model performance. We leveraged LLM to automate the task of final answer evaluation by providing ground truth and predicted solutions. This research will help to assess and improve the geometric reasoning capabilities of LVLMs.

* 15 pages

Via

Access Paper or Ask Questions

A Comprehensive Survey of AI-Driven Advancements and Techniques in Automated Program Repair and Code Generation

Nov 12, 2024

Avinash Anand, Akshit Gupta, Nishchay Yadav, Shaurya Bajaj

Figure 1 for A Comprehensive Survey of AI-Driven Advancements and Techniques in Automated Program Repair and Code Generation

Figure 2 for A Comprehensive Survey of AI-Driven Advancements and Techniques in Automated Program Repair and Code Generation

Figure 3 for A Comprehensive Survey of AI-Driven Advancements and Techniques in Automated Program Repair and Code Generation

Figure 4 for A Comprehensive Survey of AI-Driven Advancements and Techniques in Automated Program Repair and Code Generation

Abstract:Bug fixing and code generation have been core research topics in software development for many years. The recent explosive growth in Large Language Models has completely transformed these spaces, putting in reach incredibly powerful tools for both. In this survey, 27 recent papers have been reviewed and split into two groups: one dedicated to Automated Program Repair (APR) and LLM integration and the other to code generation using LLMs. The first group consists of new methods for bug detection and repair, which include locating semantic errors, security vulnerabilities, and runtime failure bugs. The place of LLMs in reducing manual debugging efforts is emphasized in this work by APR toward context-aware fixes, with innovations that boost accuracy and efficiency in automatic debugging. The second group dwells on code generation, providing an overview of both general-purpose LLMs fine-tuned for programming and task-specific models. It also presents methods to improve code generation, such as identifier-aware training, fine-tuning at the instruction level, and incorporating semantic code structures. This survey work contrasts the methodologies in APR and code generation to identify trends such as using LLMs, feedback loops to enable iterative code improvement and open-source models. It also discusses the challenges of achieving functional correctness and security and outlines future directions for research in LLM-based software development.

* A survey of recent developments in AI-assisted automated program repair

Via

Access Paper or Ask Questions

Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities

Jul 08, 2024

Avinash Anand, Chayan Tank, Sarthak Pol, Vinayak Katoch, Shaina Mehta, Rajiv Ratn Shah

Figure 1 for Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities

Figure 2 for Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities

Figure 3 for Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities

Figure 4 for Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities

Abstract:Depression has proven to be a significant public health issue, profoundly affecting the psychological well-being of individuals. If it remains undiagnosed, depression can lead to severe health issues, which can manifest physically and even lead to suicide. Generally, Diagnosing depression or any other mental disorder involves conducting semi-structured interviews alongside supplementary questionnaires, including variants of the Patient Health Questionnaire (PHQ) by Clinicians and mental health professionals. This approach places significant reliance on the experience and judgment of trained physicians, making the diagnosis susceptible to personal biases. Given that the underlying mechanisms causing depression are still being actively researched, physicians often face challenges in diagnosing and treating the condition, particularly in its early stages of clinical presentation. Recently, significant strides have been made in Artificial neural computing to solve problems involving text, image, and speech in various domains. Our analysis has aimed to leverage these state-of-the-art (SOTA) models in our experiments to achieve optimal outcomes leveraging multiple modalities. The experiments were performed on the Extended Distress Analysis Interview Corpus Wizard of Oz dataset (E-DAIC) corpus presented in the Audio/Visual Emotion Challenge (AVEC) 2019 Challenge. The proposed solutions demonstrate better results achieved by Proprietary and Open-source Large Language Models (LLMs), which achieved a Root Mean Square Error (RMSE) score of 3.98 on Textual Modality, beating the AVEC 2019 challenge baseline results and current SOTA regression analysis architectures. Additionally, the proposed solution achieved an accuracy of 71.43% in the classification task. The paper also includes a novel audio-visual multi-modal network that predicts PHQ-8 scores with an RMSE of 6.51.

* 12 pages, 9 figures, 9 tables

Via

Access Paper or Ask Questions

Keystroke Dynamics Against Academic Dishonesty in the Age of LLMs

Jun 21, 2024

Debnath Kundu, Atharva Mehta, Rajesh Kumar, Naman Lal, Avinash Anand, Apoorv Singh, Rajiv Ratn Shah

Figure 1 for Keystroke Dynamics Against Academic Dishonesty in the Age of LLMs

Figure 2 for Keystroke Dynamics Against Academic Dishonesty in the Age of LLMs

Figure 3 for Keystroke Dynamics Against Academic Dishonesty in the Age of LLMs

Figure 4 for Keystroke Dynamics Against Academic Dishonesty in the Age of LLMs

Abstract:The transition to online examinations and assignments raises significant concerns about academic integrity. Traditional plagiarism detection systems often struggle to identify instances of intelligent cheating, particularly when students utilize advanced generative AI tools to craft their responses. This study proposes a keystroke dynamics-based method to differentiate between bona fide and assisted writing within academic contexts. To facilitate this, a dataset was developed to capture the keystroke patterns of individuals engaged in writing tasks, both with and without the assistance of generative AI. The detector, trained using a modified TypeNet architecture, achieved accuracies ranging from 74.98% to 85.72% in condition-specific scenarios and from 52.24% to 80.54% in condition-agnostic scenarios. The findings highlight significant differences in keystroke dynamics between genuine and assisted writing. The outcomes of this study enhance our understanding of how users interact with generative AI and have implications for improving the reliability of digital educational platforms.

* Accepted for publication at The IEEE International Joint Conference on Biometrics (IJCB2024), contains 9 pages, 3 figures, 3 tables

Via

Access Paper or Ask Questions