Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Abhishek Dutta

Unveiling Reasoning Thresholds in Language Models: Scaling, Fine-Tuning, and Interpretability through Attention Maps

Feb 21, 2025

Yen-Che Hsiao, Abhishek Dutta

Abstract:This study investigates the in-context learning capabilities of various decoder-only transformer-based language models with different model sizes and training data, including GPT2, SmolLM2, OpenELM, TinyLlama, Stable LM, and Gemma 2. We identify a critical parameter threshold (~1.6 billion), beyond which reasoning performance improves significantly in tasks such as commonsense reasoning in multiple-choice question answering and deductive reasoning. Specifically, models above this threshold achieve better success rates in chain-of-thought (CoT) prompting for deductive reasoning tasks, especially those requiring longer reasoning chains, such as proof by contradiction and disjunction elimination. To address limitations in sub-threshold models, we demonstrate that fine-tuning with task-specific exemplars substantially enhances reasoning performance, enabling accurate CoT generation even without additional exemplars in the prompt for tasks with shorter reasoning chains. Finally, our analysis of attention maps reveals that models capable of generating correct CoTs exhibit higher token-level attention scores on subsequent correct tokens and the correct parts of speech, providing interpretability insights into reasoning processes. These findings collectively advance understanding of reasoning capabilities in decoder-only transformer-based models. The code is available at: https://github.com/AnnonymousForPapers/CoT_Reasoning_Test.

Via

Access Paper or Ask Questions

Adaptive Reasoning and Acting in Medical Language Agents

Oct 13, 2024

Abhishek Dutta, Yen-Che Hsiao

Abstract:This paper presents an innovative large language model (LLM) agent framework for enhancing diagnostic accuracy in simulated clinical environments using the AgentClinic benchmark. The proposed automatic correction enables doctor agents to iteratively refine their reasoning and actions following incorrect diagnoses, fostering improved decision-making over time. Experiments show that the implementation of the adaptive LLM-based doctor agents achieve correct diagnoses through dynamic interactions with simulated patients. The evaluations highlight the capacity of autonomous agents to adapt and improve in complex medical scenarios. Future enhancements will focus on refining the algorithm and expanding its applicability across a wider range of tasks and different large language models.

Via

Access Paper or Ask Questions

Efficient transformer with reinforced position embedding for language models

Oct 07, 2024

Yen-Che Hsiao, Abhishek Dutta

Abstract:In this paper, we propose an efficient transformer architecture that uses reinforced positional embedding to obtain superior performance with half the number of encoder decoder layers. We demonstrate that concatenating positional encoding with trainable token embeddings, normalizing columns in the token embedding matrix, and using the normalized token embedding matrix as the value of the attention layer improve the training and validation loss and the training time in an encoder-decoder Transformer model for a Portuguese-English translation task with 10 epochs or 12 hours of training across 10 trials. Our method, with roughly a threefold parameter reduction compared to the baseline model, yields a mean training loss of 1.21, a mean validation loss of 1.51, and an average training time of 1352.27 seconds per epoch, surpassing the baseline model with the same embedding dimension that employs addition of positional encoding and token embeddings, which achieves a mean training loss of 1.96, a validation loss of 2.18, and an average training time of 4297.79 seconds per epoch. Additionally, we evaluated our proposed architecture and the baseline across 14 diverse translation datasets from TensorFlow. The results indicate that our method consistently achieves lower or comparable training and validation losses, suggesting enhanced learning efficiency.

Via

Access Paper or Ask Questions

Towards Autonomous Agents: Adaptive-planning, Reasoning, and Acting in Language Models

Aug 12, 2024

Yen-Che Hsiao, Abhishek Dutta

Abstract:We propose a novel in-context learning algorithm for building autonomous decision-making language agents. The language agent continuously attempts to solve the same task by self-correcting each time the task fails. Our selected language agent demonstrates the ability to solve tasks in a text-based game environment. Our results show that the gemma-2-9b-it language model, using our proposed method, can successfully complete two of six tasks that failed in the first attempt. This highlights the effectiveness of our approach in enhancing the problem-solving capabilities of a single language model through self-correction, paving the way for more advanced autonomous agents. The code is publicly available at https://github.com/YenCheHsiao/AutonomousLLMAgentwithAdaptingPlanning.

Via

Access Paper or Ask Questions

Hybrid Coordinate Descent for Efficient Neural Network Learning Using Line Search and Gradient Descent

Aug 02, 2024

Yen-Che Hsiao, Abhishek Dutta

Abstract:This paper presents a novel coordinate descent algorithm leveraging a combination of one-directional line search and gradient information for parameter updates for a squared error loss function. Each parameter undergoes updates determined by either the line search or gradient method, contingent upon whether the modulus of the gradient of the loss with respect to that parameter surpasses a predefined threshold. Notably, a larger threshold value enhances algorithmic efficiency. Despite the potentially slower nature of the line search method relative to gradient descent, its parallelizability facilitates computational time reduction. Experimental validation conducted on a 2-layer Rectified Linear Unit network with synthetic data elucidates the impact of hyperparameters on convergence rates and computational efficiency.

Via

Access Paper or Ask Questions

Derivation of Back-propagation for Graph Convolutional Networks using Matrix Calculus and its Application to Explainable Artificial Intelligence

Aug 02, 2024

Yen-Che Hsiao, Rongting Yue, Abhishek Dutta

Abstract:This paper provides a comprehensive and detailed derivation of the backpropagation algorithm for graph convolutional neural networks using matrix calculus. The derivation is extended to include arbitrary element-wise activation functions and an arbitrary number of layers. The study addresses two fundamental problems, namely node classification and link prediction. To validate our method, we compare it with reverse-mode automatic differentiation. The experimental results demonstrate that the median sum of squared errors of the updated weight matrices, when comparing our method to the approach using reverse-mode automatic differentiation, falls within the range of $10^{-18}$ to $10^{-14}$. These outcomes are obtained from conducting experiments on a five-layer graph convolutional network, applied to a node classification problem on Zachary's karate club social network and a link prediction problem on a drug-drug interaction network. Finally, we show how the derived closed-form solution can facilitate the development of explainable AI and sensitivity analysis.

Via

Access Paper or Ask Questions

A Comparative Study of Hierarchical Risk Parity Portfolio and Eigen Portfolio on the NIFTY 50 Stocks

Oct 03, 2022

Jaydip Sen, Abhishek Dutta

Abstract:Portfolio optimization has been an area of research that has attracted a lot of attention from researchers and financial analysts. Designing an optimum portfolio is a complex task since it not only involves accurate forecasting of future stock returns and risks but also needs to optimize them. This paper presents a systematic approach to portfolio optimization using two approaches, the hierarchical risk parity algorithm and the Eigen portfolio on seven sectors of the Indian stock market. The portfolios are built following the two approaches to historical stock prices from Jan 1, 2016, to Dec 31, 2020. The portfolio performances are evaluated on the test data from Jan 1, 2021, to Nov 1, 2021. The backtesting results of the portfolios indicate that the performance of the HRP portfolio is superior to that of its Eigen counterpart on both training and test data for the majority of the sectors studied.

* This is the accepted version of our paper at the 2nd International Conference on Computational Intelligence and Data Analytics, January 8 - 9, 2021, Hyderabad. The paper is 15 pages long and it contains 21 figures and 7 tables. arXiv admin note: substantial text overlap with arXiv:2202.02728

Via

Access Paper or Ask Questions

Precise Stock Price Prediction for Optimized Portfolio Design Using an LSTM Model

Mar 02, 2022

Jaydip Sen, Sidra Mehtab, Abhishek Dutta, Saikat Mondal

Figure 1 for Precise Stock Price Prediction for Optimized Portfolio Design Using an LSTM Model

Figure 2 for Precise Stock Price Prediction for Optimized Portfolio Design Using an LSTM Model

Figure 3 for Precise Stock Price Prediction for Optimized Portfolio Design Using an LSTM Model

Figure 4 for Precise Stock Price Prediction for Optimized Portfolio Design Using an LSTM Model

Abstract:Accurate prediction of future prices of stocks is a difficult task to perform. Even more challenging is to design an optimized portfolio of stocks with the identification of proper weights of allocation to achieve the optimized values of return and risk. We present optimized portfolios based on the seven sectors of the Indian economy. The past prices of the stocks are extracted from the web from January 1, 2016, to December 31, 2020. Optimum portfolios are designed on the selected seven sectors. An LSTM regression model is also designed for predicting future stock prices. Five months after the construction of the portfolios, i.e., on June 1, 2021, the actual and predicted returns and risks of each portfolio are computed. The predicted and the actual returns indicate the very high accuracy of the LSTM model.

* This is the accepted version of our paper in the IEEE 19th OITS International Conference on Information Technology (OCIT 21). The final version is available in the IEEE Xplore. The paper consists of 6 pages and it includes 9 figures and 20 tables. arXiv admin note: substantial text overlap with arXiv:2202.02723, arXiv:2111.04709

Via

Access Paper or Ask Questions

Hierarchical Risk Parity and Minimum Variance Portfolio Design on NIFTY 50 Stocks

Feb 06, 2022

Jaydip Sen, Sidra Mehtab, Abhishek Dutta, Saikat Mondal

Figure 1 for Hierarchical Risk Parity and Minimum Variance Portfolio Design on NIFTY 50 Stocks

Figure 2 for Hierarchical Risk Parity and Minimum Variance Portfolio Design on NIFTY 50 Stocks

Figure 3 for Hierarchical Risk Parity and Minimum Variance Portfolio Design on NIFTY 50 Stocks

Figure 4 for Hierarchical Risk Parity and Minimum Variance Portfolio Design on NIFTY 50 Stocks

Abstract:Portfolio design and optimization have been always an area of research that has attracted a lot of attention from researchers from the finance domain. Designing an optimum portfolio is a complex task since it involves accurate forecasting of future stock returns and risks and making a suitable tradeoff between them. This paper proposes a systematic approach to designing portfolios using two algorithms, the critical line algorithm, and the hierarchical risk parity algorithm on eight sectors of the Indian stock market. While the portfolios are designed using the stock price data from Jan 1, 2016, to Dec 31, 2020, they are tested on the data from Jan 1, 2021, to Aug 26, 2021. The backtesting results of the portfolios indicate while the performance of the CLA algorithm is superior on the training data, the HRP algorithm has outperformed the CLA algorithm on the test data.

* The is the preprint version of our published paper listed in the IEEE Xplore. The final paper is published in the Proceedings of the IEEE International Conference on Decision Aid Sciences and Applications, pp. 668-675, December 7-8, 2021, Bahrain. The preprint consists of 8 pages and it contains 32 figures and 9 tables

Via

Access Paper or Ask Questions

Machine Learning: Algorithms, Models, and Applications

Jan 06, 2022

Jaydip Sen, Sidra Mehtab, Rajdeep Sen, Abhishek Dutta, Pooja Kherwa, Saheel Ahmed, Pranay Berry, Sahil Khurana, Sonali Singh, David W. W Cadotte(+5 more)

Figure 1 for Machine Learning: Algorithms, Models, and Applications

Figure 2 for Machine Learning: Algorithms, Models, and Applications

Figure 3 for Machine Learning: Algorithms, Models, and Applications

Figure 4 for Machine Learning: Algorithms, Models, and Applications

Abstract:Recent times are witnessing rapid development in machine learning algorithm systems, especially in reinforcement learning, natural language processing, computer and robot vision, image processing, speech, and emotional processing and understanding. In tune with the increasing importance and relevance of machine learning models, algorithms, and their applications, and with the emergence of more innovative uses cases of deep learning and artificial intelligence, the current volume presents a few innovative research works and their applications in real world, such as stock trading, medical and healthcare systems, and software automation. The chapters in the book illustrate how machine learning and deep learning algorithms and models are designed, optimized, and deployed. The volume will be useful for advanced graduate and doctoral students, researchers, faculty members of universities, practicing data scientists and data engineers, professionals, and consultants working on the broad areas of machine learning, deep learning, and artificial intelligence.

* Published by IntechOpen, London Uk in Dec 2021. the book contains 6 chapters spanning over 154 pages

Via

Access Paper or Ask Questions