Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Eldan Cohen

Variational Prefix Tuning for Diverse and Accurate Code Summarization Using Pre-trained Language Models

May 14, 2025

Junda Zhao, Yuliang Song, Eldan Cohen

Abstract:Recent advancements in source code summarization have leveraged transformer-based pre-trained models, including Large Language Models of Code (LLMCs), to automate and improve the generation of code summaries. However, existing methods often focus on generating a single high-quality summary for a given source code, neglecting scenarios where the generated summary might be inadequate and alternative options are needed. In this paper, we introduce Variational Prefix Tuning (VPT), a novel approach that enhances pre-trained models' ability to generate diverse yet accurate sets of summaries, allowing the user to choose the most suitable one for the given source code. Our method integrates a Conditional Variational Autoencoder (CVAE) framework as a modular component into pre-trained models, enabling us to model the distribution of observed target summaries and sample continuous embeddings to be used as prefixes to steer the generation of diverse outputs during decoding. Importantly, we construct our method in a parameter-efficient manner, eliminating the need for expensive model retraining, especially when using LLMCs. Furthermore, we employ a bi-criteria reranking method to select a subset of generated summaries, optimizing both the diversity and the accuracy of the options presented to users. We present extensive experimental evaluations using widely used datasets and current state-of-the-art pre-trained code summarization models to demonstrate the effectiveness of our approach and its adaptability across models.

* Accepted by the Journal of Systems and Software

Via

Access Paper or Ask Questions

NeurCAM: Interpretable Neural Clustering via Additive Models

Aug 23, 2024

Nakul Upadhya, Eldan Cohen

Abstract:Interpretable clustering algorithms aim to group similar data points while explaining the obtained groups to support knowledge discovery and pattern recognition tasks. While most approaches to interpretable clustering construct clusters using decision trees, the interpretability of trees often deteriorates on complex problems where large trees are required. In this work, we introduce the Neural Clustering Additive Model (NeurCAM), a novel approach to the interpretable clustering problem that leverages neural generalized additive models to provide fuzzy cluster membership with additive explanations of the obtained clusters. To promote sparsity in our model's explanations, we introduce selection gates that explicitly limit the number of features and pairwise interactions leveraged. Additionally, we demonstrate the capacity of our model to perform text clustering that considers the contextual representation of the texts while providing explanations for the obtained clusters based on uni- or bi-word terms. Extensive experiments show that NeurCAM achieves performance comparable to black-box methods on tabular datasets while remaining interpretable. Additionally, our approach significantly outperforms other interpretable clustering approaches when clustering on text data.

* Accepted to ECAI 2024; Official code implementation found at https://github.com/optimal-uoft/NeurCAM

Via

Access Paper or Ask Questions

Optimal Decision Trees For Interpretable Clustering with Constraints

Jan 30, 2023

Pouya Shati, Eldan Cohen, Sheila McIlraith

Figure 1 for Optimal Decision Trees For Interpretable Clustering with Constraints

Figure 2 for Optimal Decision Trees For Interpretable Clustering with Constraints

Figure 3 for Optimal Decision Trees For Interpretable Clustering with Constraints

Figure 4 for Optimal Decision Trees For Interpretable Clustering with Constraints

Abstract:Constrained clustering is a semi-supervised task that employs a limited amount of labelled data, formulated as constraints, to incorporate domain-specific knowledge and to significantly improve clustering accuracy. Previous work has considered exact optimization formulations that can guarantee optimal clustering while satisfying all constraints, however these approaches lack interpretability. Recently, decision-trees have been used to produce inherently interpretable clustering solutions, however existing approaches do not support clustering constraints and do not provide strong theoretical guarantees on solution quality. In this work, we present a novel SAT-based framework for interpretable clustering that supports clustering constraints and that also provides strong theoretical guarantees on solution quality. We also present new insight into the trade-off between interpretability and satisfaction of such user-provided constraints. Our framework is the first approach for interpretable and constrained clustering. Experiments with a range of real-world and synthetic datasets demonstrate that our approach can produce high-quality and interpretable constrained clustering solutions.

Via

Access Paper or Ask Questions

Ising-based Consensus Clustering on Specialized Hardware

Mar 04, 2020

Eldan Cohen, Avradip Mandal, Hayato Ushijima-Mwesigwa, Arnab Roy

Figure 1 for Ising-based Consensus Clustering on Specialized Hardware

Figure 2 for Ising-based Consensus Clustering on Specialized Hardware

Figure 3 for Ising-based Consensus Clustering on Specialized Hardware

Figure 4 for Ising-based Consensus Clustering on Specialized Hardware

Abstract:The emergence of specialized optimization hardware such as CMOS annealers and adiabatic quantum computers carries the promise of solving hard combinatorial optimization problems more efficiently in hardware. Recent work has focused on formulating different combinatorial optimization problems as Ising models, the core mathematical abstraction used by a large number of these hardware platforms, and evaluating the performance of these models when solved on specialized hardware. An interesting area of application is data mining, where combinatorial optimization problems underlie many core tasks. In this work, we focus on consensus clustering (clustering aggregation), an important combinatorial problem that has received much attention over the last two decades. We present two Ising models for consensus clustering and evaluate them using the Fujitsu Digital Annealer, a quantum-inspired CMOS annealer. Our empirical evaluation shows that our approach outperforms existing techniques and is a promising direction for future research.

* Accepted in Symposium on Intelligent Data Analysis 2020

Via

Access Paper or Ask Questions