Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Adnan Qidwai

LLM-Aided Customizable Profiling of Code Data Based On Programming Language Concepts

Mar 19, 2025

Pankaj Thorat, Adnan Qidwai, Adrija Dhar, Aishwariya Chakraborty, Anand Eswaran, Hima Patel, Praveen Jayachandran

Abstract:Data profiling is critical in machine learning for generating descriptive statistics, supporting both deeper understanding and downstream tasks like data valuation and curation. This work addresses profiling specifically in the context of code datasets for Large Language Models (code-LLMs), where data quality directly influences tasks such as code generation and summarization. Characterizing code datasets in terms of programming language concepts enables better insights and targeted data curation. Our proposed methodology decomposes code data profiling into two phases: (1) an offline phase where LLMs are leveraged to derive and learn rules for extracting syntactic and semantic concepts across various programming languages, including previously unseen or low-resource languages, and (2) an online deterministic phase applying these derived rules for efficient real-time analysis. This hybrid approach is customizable, extensible to new syntactic and semantic constructs, and scalable to multiple languages. Experimentally, our LLM-aided method achieves a mean accuracy of 90.33% for syntactic extraction rules and semantic classification accuracies averaging 80% and 77% across languages and semantic concepts, respectively.

* 21 pages

Via

Access Paper or Ask Questions

Unraveling the Truth: Do LLMs really Understand Charts? A Deep Dive into Consistency and Robustness

Jul 15, 2024

Srija Mukhopadhyay, Adnan Qidwai, Aparna Garimella, Pritika Ramu, Vivek Gupta, Dan Roth

Figure 1 for Unraveling the Truth: Do LLMs really Understand Charts? A Deep Dive into Consistency and Robustness

Figure 2 for Unraveling the Truth: Do LLMs really Understand Charts? A Deep Dive into Consistency and Robustness

Figure 3 for Unraveling the Truth: Do LLMs really Understand Charts? A Deep Dive into Consistency and Robustness

Figure 4 for Unraveling the Truth: Do LLMs really Understand Charts? A Deep Dive into Consistency and Robustness

Abstract:Chart question answering (CQA) is a crucial area of Visual Language Understanding. However, the robustness and consistency of current Visual Language Models (VLMs) in this field remain under-explored. This paper evaluates state-of-the-art VLMs on comprehensive datasets, developed specifically for this study, encompassing diverse question categories and chart formats. We investigate two key aspects: 1) the models' ability to handle varying levels of chart and question complexity, and 2) their robustness across different visual representations of the same underlying data. Our analysis reveals significant performance variations based on question and chart types, highlighting both strengths and weaknesses of current models. Additionally, we identify areas for improvement and propose future research directions to build more robust and reliable CQA systems. This study sheds light on the limitations of current models and paves the way for future advancements in the field.

* 22 pages, 7 Tables, 3 Figures, 25 examples

Via

Access Paper or Ask Questions