Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ali Jannesari

Analyzing Latent Concepts in Code Language Models

Oct 01, 2025

Arushi Sharma, Vedant Pungliya, Christopher J. Quinn, Ali Jannesari

Abstract:Interpreting the internal behavior of large language models trained on code remains a critical challenge, particularly for applications demanding trust, transparency, and semantic robustness. We propose Code Concept Analysis (CoCoA): a global post-hoc interpretability framework that uncovers emergent lexical, syntactic, and semantic structures in a code language model's representation space by clustering contextualized token embeddings into human-interpretable concept groups. We propose a hybrid annotation pipeline that combines static analysis tool-based syntactic alignment with prompt-engineered large language models (LLMs), enabling scalable labeling of latent concepts across abstraction levels. We analyse the distribution of concepts across layers and across three finetuning tasks. Emergent concept clusters can help identify unexpected latent interactions and be used to identify trends and biases within the model's learned representations. We further integrate LCA with local attribution methods to produce concept-grounded explanations, improving the coherence and interpretability of token-level saliency. Empirical evaluations across multiple models and tasks show that LCA discovers concepts that remain stable under semantic-preserving perturbations (average Cluster Sensitivity Index, CSI = 0.288) and evolve predictably with fine-tuning. In a user study, concept-augmented explanations disambiguate token roles. In a user study on the programming-language classification task, concept-augmented explanations disambiguated token roles and improved human-centric explainability by 37 percentage points compared with token-level attributions using Integrated Gradients.

Via

Access Paper or Ask Questions

FedProtoKD: Dual Knowledge Distillation with Adaptive Class-wise Prototype Margin for Heterogeneous Federated Learning

Aug 27, 2025

Md Anwar Hossen, Fatema Siddika, Wensheng Zhang, Anuj Sharma, Ali Jannesari

Abstract:Heterogeneous Federated Learning (HFL) has gained attention for its ability to accommodate diverse models and heterogeneous data across clients. Prototype-based HFL methods emerge as a promising solution to address statistical heterogeneity and privacy challenges, paving the way for new advancements in HFL research. This method focuses on sharing only class-representative prototypes among heterogeneous clients. However, these prototypes are often aggregated on the server using weighted averaging, leading to sub-optimal global knowledge; these cause the shrinking of aggregated prototypes, which negatively affects the model performance in scenarios when models are heterogeneous and data distributions are extremely non-IID. We propose FedProtoKD in a Heterogeneous Federated Learning setting, using an enhanced dual-knowledge distillation mechanism to improve the system performance with clients' logits and prototype feature representation. We aim to resolve the prototype margin-shrinking problem using a contrastive learning-based trainable server prototype by leveraging a class-wise adaptive prototype margin. Furthermore, we assess the importance of public samples using the closeness of the sample's prototype to its class representative prototypes, which enhances learning performance. FedProtoKD achieved average improvements of 1.13% up to 34.13% accuracy across various settings and significantly outperforms existing state-of-the-art HFL methods.

* 12 pages, 6 figures

Via

Access Paper or Ask Questions

Leveraging Manifold Embeddings for Enhanced Graph Transformer Representations and Learning

Jul 09, 2025

Ankit Jyothish, Ali Jannesari

Abstract:Graph transformers typically embed every node in a single Euclidean space, blurring heterogeneous topologies. We prepend a lightweight Riemannian mixture-of-experts layer that routes each node to various kinds of manifold, mixture of spherical, flat, hyperbolic - best matching its local structure. These projections provide intrinsic geometric explanations to the latent space. Inserted into a state-of-the-art ensemble graph transformer, this projector lifts accuracy by up to 3% on four node-classification benchmarks. The ensemble makes sure that both euclidean and non-euclidean features are captured. Explicit, geometry-aware projection thus sharpens predictive power while making graph representations more interpretable.

Via

Access Paper or Ask Questions

Enhanced Soups for Graph Neural Networks

Mar 14, 2025

Joseph Zuber, Aishwarya Sarkar, Joseph Jennings, Ali Jannesari

Abstract:Graph Neural Networks (GNN) have demonstrated state-of-the-art performance in numerous scientific and high-performance computing (HPC) applications. Recent work suggests that "souping" (combining) individually trained GNNs into a single model can improve performance without increasing compute and memory costs during inference. However, existing souping algorithms are often slow and memory-intensive, which limits their scalability. We introduce Learned Souping for GNNs, a gradient-descent-based souping strategy that substantially reduces time and memory overhead compared to existing methods. Our approach is evaluated across multiple Open Graph Benchmark (OGB) datasets and GNN architectures, achieving up to 1.2% accuracy improvement and 2.1X speedup. Additionally, we propose Partition Learned Souping, a novel partition-based variant of learned souping that significantly reduces memory usage. On the ogbn-products dataset with GraphSAGE, partition learned souping achieves a 24.5X speedup and a 76% memory reduction without compromising accuracy.

* 10 pages, 4 figures, 3 tables, accepted to GrAPL 2025 (colocated with IPDPS 2025)

Via

Access Paper or Ask Questions

Federated Multimodal Learning with Dual Adapters and Selective Pruning for Communication and Computational Efficiency

Mar 10, 2025

Duy Phuong Nguyen, J. Pablo Munoz, Tanya Roosta, Ali Jannesari

Abstract:Federated Learning (FL) enables collaborative learning across distributed clients while preserving data privacy. However, FL faces significant challenges when dealing with heterogeneous data distributions, which can lead to suboptimal global models that fail to generalize across diverse clients. In this work, we propose a novel framework designed to tackle these challenges by introducing a dual-adapter approach. The method utilizes a larger local adapter for client-specific personalization and a smaller global adapter to facilitate efficient knowledge sharing across clients. Additionally, we incorporate a pruning mechanism to reduce communication overhead by selectively removing less impactful parameters from the local adapter. Through extensive experiments on a range of vision and language tasks, our method demonstrates superior performance compared to existing approaches. It achieves higher test accuracy, lower performance variance among clients, and improved worst-case performance, all while significantly reducing communication and computation costs. Overall, the proposed method addresses the critical trade-off between model personalization and generalization, offering a scalable solution for real-world FL applications.

* Accepted at CCGrid 2025

Via

Access Paper or Ask Questions

SuperSAM: Crafting a SAM Supernetwork via Structured Pruning and Unstructured Parameter Prioritization

Jan 15, 2025

Waqwoya Abebe, Sadegh Jafari, Sixing Yu, Akash Dutta, Jan Strube, Nathan R. Tallent, Luanzheng Guo, Pablo Munoz, Ali Jannesari

Figure 1 for SuperSAM: Crafting a SAM Supernetwork via Structured Pruning and Unstructured Parameter Prioritization

Figure 2 for SuperSAM: Crafting a SAM Supernetwork via Structured Pruning and Unstructured Parameter Prioritization

Figure 3 for SuperSAM: Crafting a SAM Supernetwork via Structured Pruning and Unstructured Parameter Prioritization

Figure 4 for SuperSAM: Crafting a SAM Supernetwork via Structured Pruning and Unstructured Parameter Prioritization

Abstract:Neural Architecture Search (NAS) is a powerful approach of automating the design of efficient neural architectures. In contrast to traditional NAS methods, recently proposed one-shot NAS methods prove to be more efficient in performing NAS. One-shot NAS works by generating a singular weight-sharing supernetwork that acts as a search space (container) of subnetworks. Despite its achievements, designing the one-shot search space remains a major challenge. In this work we propose a search space design strategy for Vision Transformer (ViT)-based architectures. In particular, we convert the Segment Anything Model (SAM) into a weight-sharing supernetwork called SuperSAM. Our approach involves automating the search space design via layer-wise structured pruning and parameter prioritization. While the structured pruning applies probabilistic removal of certain transformer layers, parameter prioritization performs weight reordering and slicing of MLP-blocks in the remaining layers. We train supernetworks on several datasets using the sandwich rule. For deployment, we enhance subnetwork discovery by utilizing a program autotuner to identify efficient subnetworks within the search space. The resulting subnetworks are 30-70% smaller in size compared to the original pre-trained SAM ViT-B, yet outperform the pretrained model. Our work introduces a new and effective method for ViT NAS search-space design.

Via

Access Paper or Ask Questions

Fortran2CPP: Automating Fortran-to-C++ Migration using LLMs via Multi-Turn Dialogue and Dual-Agent Integration

Dec 27, 2024

Le Chen, Bin Lei, Dunzhi Zhou, Pei-Hung Lin, Chunhua Liao, Caiwen Ding, Ali Jannesari

Figure 1 for Fortran2CPP: Automating Fortran-to-C++ Migration using LLMs via Multi-Turn Dialogue and Dual-Agent Integration

Figure 2 for Fortran2CPP: Automating Fortran-to-C++ Migration using LLMs via Multi-Turn Dialogue and Dual-Agent Integration

Figure 3 for Fortran2CPP: Automating Fortran-to-C++ Migration using LLMs via Multi-Turn Dialogue and Dual-Agent Integration

Figure 4 for Fortran2CPP: Automating Fortran-to-C++ Migration using LLMs via Multi-Turn Dialogue and Dual-Agent Integration

Abstract:Migrating Fortran code to C++ is a common task for many scientific computing teams, driven by the need to leverage modern programming paradigms, enhance cross-platform compatibility, and improve maintainability. Automating this translation process using large language models (LLMs) has shown promise, but the lack of high-quality, specialized datasets has hindered their effectiveness. In this paper, we address this challenge by introducing a novel multi-turn dialogue dataset, Fortran2CPP, specifically designed for Fortran-to-C++ code migration. Our dataset, significantly larger than existing alternatives, is generated using a unique LLM-driven, dual-agent pipeline incorporating iterative compilation, execution, and code repair to ensure high quality and functional correctness. To demonstrate the effectiveness of our dataset, we fine-tuned several open-weight LLMs on Fortran2CPP and evaluated their performance on two independent benchmarks. Fine-tuning on our dataset led to remarkable gains, with models achieving up to a 3.31x increase in CodeBLEU score and a 92\% improvement in compilation success rate. This highlights the dataset's ability to enhance both the syntactic accuracy and compilability of the translated C++ code. Our dataset and model have been open-sourced and are available on our public GitHub repository\footnote{\url{https://github.com/HPC-Fortran2CPP/Fortran2Cpp}}.

Via

Access Paper or Ask Questions

MassiveGNN: Efficient Training via Prefetching for Massively Connected Distributed Graphs

Oct 30, 2024

Aishwarya Sarkar, Sayan Ghosh, Nathan R. Tallent, Ali Jannesari

Abstract:Graph Neural Networks (GNN) are indispensable in learning from graph-structured data, yet their rising computational costs, especially on massively connected graphs, pose significant challenges in terms of execution performance. To tackle this, distributed-memory solutions such as partitioning the graph to concurrently train multiple replicas of GNNs are in practice. However, approaches requiring a partitioned graph usually suffer from communication overhead and load imbalance, even under optimal partitioning and communication strategies due to irregularities in the neighborhood minibatch sampling. This paper proposes practical trade-offs for improving the sampling and communication overheads for representation learning on distributed graphs (using popular GraphSAGE architecture) by developing a parameterized continuous prefetch and eviction scheme on top of the state-of-the-art Amazon DistDGL distributed GNN framework, demonstrating about 15-40% improvement in end-to-end training performance on the National Energy Research Scientific Computing Center's (NERSC) Perlmutter supercomputer for various OGB datasets.

* In Proc. of the IEEE International Conference on Cluster Computing (CLUSTER), 2024

Via

Access Paper or Ask Questions

CodeRosetta: Pushing the Boundaries of Unsupervised Code Translation for Parallel Programming

Oct 27, 2024

Ali TehraniJamsaz, Arijit Bhattacharjee, Le Chen, Nesreen K. Ahmed, Amir Yazdanbakhsh, Ali Jannesari

Figure 1 for CodeRosetta: Pushing the Boundaries of Unsupervised Code Translation for Parallel Programming

Figure 2 for CodeRosetta: Pushing the Boundaries of Unsupervised Code Translation for Parallel Programming

Figure 3 for CodeRosetta: Pushing the Boundaries of Unsupervised Code Translation for Parallel Programming

Figure 4 for CodeRosetta: Pushing the Boundaries of Unsupervised Code Translation for Parallel Programming

Abstract:Recent advancements in Large Language Models (LLMs) have renewed interest in automatic programming language translation. Encoder-decoder transformer models, in particular, have shown promise in translating between different programming languages. However, translating between a language and its high-performance computing (HPC) extensions remains underexplored due to challenges such as complex parallel semantics. In this paper, we introduce CodeRosetta, an encoder-decoder transformer model designed specifically for translating between programming languages and their HPC extensions. CodeRosetta is evaluated on C++ to CUDA and Fortran to C++ translation tasks. It uses a customized learning framework with tailored pretraining and training objectives to effectively capture both code semantics and parallel structural nuances, enabling bidirectional translation. Our results show that CodeRosetta outperforms state-of-the-art baselines in C++ to CUDA translation by 2.9 BLEU and 1.72 CodeBLEU points while improving compilation accuracy by 6.05%. Compared to general closed-source LLMs, our method improves C++ to CUDA translation by 22.08 BLEU and 14.39 CodeBLEU, with 2.75% higher compilation accuracy. Finally, CodeRosetta exhibits proficiency in Fortran to parallel C++ translation, marking it, to our knowledge, as the first encoder-decoder model for this complex task, improving CodeBLEU by at least 4.63 points compared to closed-source and open-code LLMs.

Via

Access Paper or Ask Questions

PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation

Jul 16, 2024

Branden Butler, Sixing Yu, Arya Mazaheri, Ali Jannesari

Figure 1 for PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation

Figure 2 for PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation

Figure 3 for PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation

Figure 4 for PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation

Abstract:Inference of Large Language Models (LLMs) across computer clusters has become a focal point of research in recent times, with many acceleration techniques taking inspiration from CPU speculative execution. These techniques reduce bottlenecks associated with memory bandwidth, but also increase end-to-end latency per inference run, requiring high speculation acceptance rates to improve performance. Combined with a variable rate of acceptance across tasks, speculative inference techniques can result in reduced performance. Additionally, pipeline-parallel designs require many user requests to maintain maximum utilization. As a remedy, we propose PipeInfer, a pipelined speculative acceleration technique to reduce inter-token latency and improve system utilization for single-request scenarios while also improving tolerance to low speculation acceptance rates and low-bandwidth interconnects. PipeInfer exhibits up to a 2.15$\times$ improvement in generation speed over standard speculative inference. PipeInfer achieves its improvement through Continuous Asynchronous Speculation and Early Inference Cancellation, the former improving latency and generation speed by running single-token inference simultaneously with several speculative runs, while the latter improves speed and latency by skipping the computation of invalidated runs, even in the middle of inference.

* 11 pages, submitted to SC24 conference

Via

Access Paper or Ask Questions