Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nishat Raihan

TigerLLM -- A Family of Bangla Large Language Models

Mar 14, 2025

Nishat Raihan, Marcos Zampieri

Abstract:The development of Large Language Models (LLMs) remains heavily skewed towards English and a few other high-resource languages. This linguistic disparity is particularly evident for Bangla - the 5th most spoken language. A few initiatives attempted to create open-source Bangla LLMs with performance still behind high-resource languages and limited reproducibility. To address this gap, we introduce TigerLLM - a family of Bangla LLMs. Our results demonstrate that these models surpass all open-source alternatives and also outperform larger proprietary models like GPT3.5 across standard benchmarks, establishing TigerLLM as the new baseline for future Bangla language modeling.

Via

Access Paper or Ask Questions

Code LLMs: A Taxonomy-based Survey

Dec 11, 2024

Nishat Raihan, Christian Newman, Marcos Zampieri

Abstract:Large language models (LLMs) have demonstrated remarkable capabilities across various NLP tasks and have recently expanded their impact to coding tasks, bridging the gap between natural languages (NL) and programming languages (PL). This taxonomy-based survey provides a comprehensive analysis of LLMs in the NL-PL domain, investigating how these models are utilized in coding tasks and examining their methodologies, architectures, and training processes. We propose a taxonomy-based framework that categorizes relevant concepts, providing a unified classification system to facilitate a deeper understanding of this rapidly evolving field. This survey offers insights into the current state and future directions of LLMs in coding tasks, including their applications and limitations.

Via

Access Paper or Ask Questions

MojoBench: Language Modeling and Benchmarks for Mojo

Oct 23, 2024

Nishat Raihan, Joanna C. S. Santos, Marcos Zampieri

Abstract:The recently introduced Mojo programming language (PL) by Modular, has received significant attention in the scientific community due to its claimed significant speed boost over Python. Despite advancements in code Large Language Models (LLMs) across various PLs, Mojo remains unexplored in this context. To address this gap, we introduce MojoBench, the first framework for Mojo code generation. MojoBench includes HumanEval-Mojo, a benchmark dataset designed for evaluating code LLMs on Mojo, and Mojo-Coder, the first LLM pretrained and finetuned for Mojo code generation, which supports instructions in 5 natural languages (NLs). Our results show that Mojo-Coder achieves a 30-35% performance improvement over leading models like GPT-4o and Claude-3.5-Sonnet. Furthermore, we provide insights into LLM behavior with underrepresented and unseen PLs, offering potential strategies for enhancing model adaptability. MojoBench contributes to our understanding of LLM capabilities and limitations in emerging programming paradigms fostering more robust code generation systems.

Via

Access Paper or Ask Questions

Large Language Models in Computer Science Education: A Systematic Literature Review

Oct 21, 2024

Nishat Raihan, Mohammed Latif Siddiq, Joanna C. S. Santos, Marcos Zampieri

Figure 1 for Large Language Models in Computer Science Education: A Systematic Literature Review

Figure 2 for Large Language Models in Computer Science Education: A Systematic Literature Review

Figure 3 for Large Language Models in Computer Science Education: A Systematic Literature Review

Figure 4 for Large Language Models in Computer Science Education: A Systematic Literature Review

Abstract:Large language models (LLMs) are becoming increasingly better at a wide range of Natural Language Processing tasks (NLP), such as text generation and understanding. Recently, these models have extended their capabilities to coding tasks, bridging the gap between natural languages (NL) and programming languages (PL). Foundational models such as the Generative Pre-trained Transformer (GPT) and LLaMA series have set strong baseline performances in various NL and PL tasks. Additionally, several models have been fine-tuned specifically for code generation, showing significant improvements in code-related applications. Both foundational and fine-tuned models are increasingly used in education, helping students write, debug, and understand code. We present a comprehensive systematic literature review to examine the impact of LLMs in computer science and computer engineering education. We analyze their effectiveness in enhancing the learning experience, supporting personalized education, and aiding educators in curriculum development. We address five research questions to uncover insights into how LLMs contribute to educational outcomes, identify challenges, and suggest directions for future research.

* Accepted at 56th ACM Technical Symposium on Computer Science Education (SIGCSE TS 2025)

Via

Access Paper or Ask Questions

mHumanEval -- A Multilingual Benchmark to Evaluate Large Language Models for Code Generation

Oct 19, 2024

Nishat Raihan, Antonios Anastasopoulos, Marcos Zampieri

Figure 1 for mHumanEval -- A Multilingual Benchmark to Evaluate Large Language Models for Code Generation

Figure 2 for mHumanEval -- A Multilingual Benchmark to Evaluate Large Language Models for Code Generation

Figure 3 for mHumanEval -- A Multilingual Benchmark to Evaluate Large Language Models for Code Generation

Figure 4 for mHumanEval -- A Multilingual Benchmark to Evaluate Large Language Models for Code Generation

Abstract:Recent advancements in large language models (LLMs) have significantly enhanced code generation from natural language prompts. The HumanEval Benchmark, developed by OpenAI, remains the most widely used code generation benchmark. However, this and other Code LLM benchmarks face critical limitations, particularly in task diversity, test coverage, and linguistic scope. Current evaluations primarily focus on English-to-Python conversion tasks with limited test cases, potentially overestimating model performance. While recent works have addressed test coverage and programming language (PL) diversity, code generation from low-resource language prompts remains largely unexplored. To address this gap, we introduce mHumanEval, an extended benchmark supporting prompts in over 200 natural languages. We employ established machine translation methods to compile the benchmark, coupled with a quality assurance process. Furthermore, we provide expert human translations for 15 diverse natural languages (NLs). We conclude by analyzing the multilingual code generation capabilities of state-of-the-art (SOTA) Code LLMs, offering insights into the current landscape of cross-lingual code generation.

* 30 Pages

Via

Access Paper or Ask Questions

MasonTigers at SemEval-2024 Task 10: Emotion Discovery and Flip Reasoning in Conversation with Ensemble of Transformers and Prompting

Jun 30, 2024

Al Nahian Bin Emran, Amrita Ganguly, Sadiya Sayara Chowdhury Puspo, Nishat Raihan, Dhiman Goswami

Abstract:In this paper, we present MasonTigers' participation in SemEval-2024 Task 10, a shared task aimed at identifying emotions and understanding the rationale behind their flips within monolingual English and Hindi-English code-mixed dialogues. This task comprises three distinct subtasks - emotion recognition in conversation for Hindi-English code-mixed dialogues, emotion flip reasoning for Hindi-English code-mixed dialogues, and emotion flip reasoning for English dialogues. Our team, MasonTigers, contributed to each subtask, focusing on developing methods for accurate emotion recognition and reasoning. By leveraging our approaches, we attained impressive F1-scores of 0.78 for the first task and 0.79 for both the second and third tasks. This performance not only underscores the effectiveness of our methods across different aspects of the task but also secured us the top rank in the first and third subtasks, and the 2nd rank in the second subtask. Through extensive experimentation and analysis, we provide insights into our system's performance and contributions to each subtask.

Via

Access Paper or Ask Questions

EmoMix-3L: A Code-Mixed Dataset for Bangla-English-Hindi Emotion Detection

May 11, 2024

Nishat Raihan, Dhiman Goswami, Antara Mahmud, Antonios Anastasopoulos, Marcos Zampieri

Abstract:Code-mixing is a well-studied linguistic phenomenon that occurs when two or more languages are mixed in text or speech. Several studies have been conducted on building datasets and performing downstream NLP tasks on code-mixed data. Although it is not uncommon to observe code-mixing of three or more languages, most available datasets in this domain contain code-mixed data from only two languages. In this paper, we introduce EmoMix-3L, a novel multi-label emotion detection dataset containing code-mixed data from three different languages. We experiment with several models on EmoMix-3L and we report that MuRIL outperforms other models on this dataset.

* arXiv admin note: substantial text overlap with arXiv:2310.18387, arXiv:2310.18023

Via

Access Paper or Ask Questions

CSEPrompts: A Benchmark of Introductory Computer Science Prompts

Apr 04, 2024

Nishat Raihan, Dhiman Goswami, Sadiya Sayara Chowdhury Puspo, Christian Newman, Tharindu Ranasinghe, Marcos Zampieri

Figure 1 for CSEPrompts: A Benchmark of Introductory Computer Science Prompts

Figure 2 for CSEPrompts: A Benchmark of Introductory Computer Science Prompts

Figure 3 for CSEPrompts: A Benchmark of Introductory Computer Science Prompts

Figure 4 for CSEPrompts: A Benchmark of Introductory Computer Science Prompts

Abstract:Recent advances in AI, machine learning, and NLP have led to the development of a new generation of Large Language Models (LLMs) that are trained on massive amounts of data and often have trillions of parameters. Commercial applications (e.g., ChatGPT) have made this technology available to the general public, thus making it possible to use LLMs to produce high-quality texts for academic and professional purposes. Schools and universities are aware of the increasing use of AI-generated content by students and they have been researching the impact of this new technology and its potential misuse. Educational programs in Computer Science (CS) and related fields are particularly affected because LLMs are also capable of generating programming code in various programming languages. To help understand the potential impact of publicly available LLMs in CS education, we introduce CSEPrompts, a framework with hundreds of programming exercise prompts and multiple-choice questions retrieved from introductory CS and programming courses. We also provide experimental results on CSEPrompts to evaluate the performance of several LLMs with respect to generating Python code and answering basic computer science and programming questions.

Via

Access Paper or Ask Questions