Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Donghao Ren

Constructive Circuit Amplification: Improving Math Reasoning in LLMs via Targeted Sub-Network Updates

Dec 18, 2025

Nikhil Prakash, Donghao Ren, Dominik Moritz, Yannick Assogba

Figure 1 for Constructive Circuit Amplification: Improving Math Reasoning in LLMs via Targeted Sub-Network Updates

Figure 2 for Constructive Circuit Amplification: Improving Math Reasoning in LLMs via Targeted Sub-Network Updates

Figure 3 for Constructive Circuit Amplification: Improving Math Reasoning in LLMs via Targeted Sub-Network Updates

Figure 4 for Constructive Circuit Amplification: Improving Math Reasoning in LLMs via Targeted Sub-Network Updates

Abstract:Prior studies investigating the internal workings of LLMs have uncovered sparse subnetworks, often referred to as circuits, that are responsible for performing specific tasks. Additionally, it has been shown that model performance improvement through fine-tuning often results from the strengthening of existing circuits in the model. Taken together, these findings suggest the possibility of intervening directly on such circuits to make precise, task-targeted updates. Motivated by these findings, we propose a novel method called Constructive Circuit Amplification which identifies pivotal tokens from model reasoning traces as well as model components responsible for the desired task, and updates only those components. Applied to mathematical reasoning, it improves accuracy by up to +11.4% across multiple models while modifying as little as 1.59% of model components, with minimal impact on other abilities as measured by MMLU, TriviaQA, and TruthfulQA. These results demonstrate that targeted capabilities can be reliably enhanced by selectively updating a sparse set of model components.

* 18 pages, 3 figures

Via

Access Paper or Ask Questions

EncQA: Benchmarking Vision-Language Models on Visual Encodings for Charts

Aug 06, 2025

Kushin Mukherjee, Donghao Ren, Dominik Moritz, Yannick Assogba

Abstract:Multimodal vision-language models (VLMs) continue to achieve ever-improving scores on chart understanding benchmarks. Yet, we find that this progress does not fully capture the breadth of visual reasoning capabilities essential for interpreting charts. We introduce EncQA, a novel benchmark informed by the visualization literature, designed to provide systematic coverage of visual encodings and analytic tasks that are crucial for chart understanding. EncQA provides 2,076 synthetic question-answer pairs, enabling balanced coverage of six visual encoding channels (position, length, area, color quantitative, color nominal, and shape) and eight tasks (find extrema, retrieve value, find anomaly, filter values, compute derived value exact, compute derived value relative, correlate values, and correlate values relative). Our evaluation of 9 state-of-the-art VLMs reveals that performance varies significantly across encodings within the same task, as well as across tasks. Contrary to expectations, we observe that performance does not improve with model size for many task-encoding pairs. Our results suggest that advancing chart understanding requires targeted strategies addressing specific visual reasoning gaps, rather than solely scaling up model or dataset size.

Via

Access Paper or Ask Questions

Embedding Atlas: Low-Friction, Interactive Embedding Visualization

May 09, 2025

Donghao Ren, Fred Hohman, Halden Lin, Dominik Moritz

Figure 1 for Embedding Atlas: Low-Friction, Interactive Embedding Visualization

Figure 2 for Embedding Atlas: Low-Friction, Interactive Embedding Visualization

Figure 3 for Embedding Atlas: Low-Friction, Interactive Embedding Visualization

Figure 4 for Embedding Atlas: Low-Friction, Interactive Embedding Visualization

Abstract:Embedding projections are popular for visualizing large datasets and models. However, people often encounter "friction" when using embedding visualization tools: (1) barriers to adoption, e.g., tedious data wrangling and loading, scalability limits, no integration of results into existing workflows, and (2) limitations in possible analyses, without integration with external tools to additionally show coordinated views of metadata. In this paper, we present Embedding Atlas, a scalable, interactive visualization tool designed to make interacting with large embeddings as easy as possible. Embedding Atlas uses modern web technologies and advanced algorithms -- including density-based clustering, and automated labeling -- to provide a fast and rich data analysis experience at scale. We evaluate Embedding Atlas with a competitive analysis against other popular embedding tools, showing that Embedding Atlas's feature set specifically helps reduce friction, and report a benchmark on its real-time rendering performance with millions of points. Embedding Atlas is available as open source to support future work in embedding-based analysis.

* Website: https://apple.github.io/embedding-atlas/

Via

Access Paper or Ask Questions

A Scalable Approach to Clustering Embedding Projections

Apr 09, 2025

Donghao Ren, Fred Hohman, Dominik Moritz

Abstract:Interactive visualization of embedding projections is a useful technique for understanding data and evaluating machine learning models. Labeling data within these visualizations is critical for interpretation, as labels provide an overview of the projection and guide user navigation. However, most methods for producing labels require clustering the points, which can be computationally expensive as the number of points grows. In this paper, we describe an efficient clustering approach using kernel density estimation in the projected 2D space instead of points. This algorithm can produce high-quality cluster regions from a 2D density map in a few hundred milliseconds, orders of magnitude faster than current approaches. We contribute the design of the algorithm, benchmarks, and applications that demonstrate the utility of the algorithm, including labeling and summarization.

* 4 pages, 4 figures

Via

Access Paper or Ask Questions

Exploring Empty Spaces: Human-in-the-Loop Data Augmentation

Oct 01, 2024

Catherine Yeh, Donghao Ren, Yannick Assogba, Dominik Moritz, Fred Hohman

Figure 1 for Exploring Empty Spaces: Human-in-the-Loop Data Augmentation

Figure 2 for Exploring Empty Spaces: Human-in-the-Loop Data Augmentation

Figure 3 for Exploring Empty Spaces: Human-in-the-Loop Data Augmentation

Figure 4 for Exploring Empty Spaces: Human-in-the-Loop Data Augmentation

Abstract:Data augmentation is crucial to make machine learning models more robust and safe. However, augmenting data can be challenging as it requires generating diverse data points to rigorously evaluate model behavior on edge cases and mitigate potential harms. Creating high-quality augmentations that cover these "unknown unknowns" is a time- and creativity-intensive task. In this work, we introduce Amplio, an interactive tool to help practitioners navigate "unknown unknowns" in unstructured text datasets and improve data diversity by systematically identifying empty data spaces to explore. Amplio includes three human-in-the-loop data augmentation techniques: Augment With Concepts, Augment by Interpolation, and Augment with Large Language Model. In a user study with 18 professional red teamers, we demonstrate the utility of our augmentation methods in helping generate high-quality, diverse, and relevant model safety prompts. We find that Amplio enabled red teamers to augment data quickly and creatively, highlighting the transformative potential of interactive augmentation workflows.

Via

Access Paper or Ask Questions

Compress and Compare: Interactively Evaluating Efficiency and Behavior Across ML Model Compression Experiments

Aug 06, 2024

Angie Boggust, Venkatesh Sivaraman, Yannick Assogba, Donghao Ren, Dominik Moritz, Fred Hohman

Abstract:To deploy machine learning models on-device, practitioners use compression algorithms to shrink and speed up models while maintaining their high-quality output. A critical aspect of compression in practice is model comparison, including tracking many compression experiments, identifying subtle changes in model behavior, and negotiating complex accuracy-efficiency trade-offs. However, existing compression tools poorly support comparison, leading to tedious and, sometimes, incomplete analyses spread across disjoint tools. To support real-world comparative workflows, we develop an interactive visual system called Compress and Compare. Within a single interface, Compress and Compare surfaces promising compression strategies by visualizing provenance relationships between compressed models and reveals compression-induced behavior changes by comparing models' predictions, weights, and activations. We demonstrate how Compress and Compare supports common compression analysis tasks through two case studies, debugging failed compression on generative language models and identifying compression artifacts in image classification models. We further evaluate Compress and Compare in a user study with eight compression experts, illustrating its potential to provide structure to compression workflows, help practitioners build intuition about compression, and encourage thorough analysis of compression's effect on model behavior. Through these evaluations, we identify compression-specific challenges that future visual analytics tools should consider and Compress and Compare visualizations that may generalize to broader model comparison tasks.

* Accepted to VIS 2024

Via

Access Paper or Ask Questions

Evaluating Long Range Dependency Handling in Code Generation Models using Multi-Step Key Retrieval

Jul 23, 2024

Yannick Assogba, Donghao Ren

Abstract:As language models support larger and larger context sizes, evaluating their ability to make effective use of that context becomes increasingly important. We analyze the ability of several code generation models to handle long range dependencies using a suite of multi-step key retrieval tasks in context windows up to 8k tokens in length. The tasks progressively increase in difficulty and allow more nuanced evaluation of model capabilities than tests like the popular needle-in-the-haystack test. We find that performance degrades significantly (up to 2x) when a function references another function that is defined later in the prompt. We also observe that models that use sliding window attention mechanisms have difficulty handling references further than the size of a single window. We perform simple prompt modifications using call graph information to improve multi-step retrieval performance up to 3x. Our analysis highlights different facets of long-context performance and is suggestive of prompt construction strategies for code completion tools

* 29 pages, 18 figures

Via

Access Paper or Ask Questions

Model Compression in Practice: Lessons Learned from Practitioners Creating On-device Machine Learning Experiences

Oct 06, 2023

Fred Hohman, Mary Beth Kery, Donghao Ren, Dominik Moritz

Figure 1 for Model Compression in Practice: Lessons Learned from Practitioners Creating On-device Machine Learning Experiences

Figure 2 for Model Compression in Practice: Lessons Learned from Practitioners Creating On-device Machine Learning Experiences

Figure 3 for Model Compression in Practice: Lessons Learned from Practitioners Creating On-device Machine Learning Experiences

Figure 4 for Model Compression in Practice: Lessons Learned from Practitioners Creating On-device Machine Learning Experiences

Abstract:On-device machine learning (ML) promises to improve the privacy, responsiveness, and proliferation of new, intelligent user experiences by moving ML computation onto everyday personal devices. However, today's large ML models must be drastically compressed to run efficiently on-device, a hurtle that requires deep, yet currently niche expertise. To engage the broader human-centered ML community in on-device ML experiences, we present the results from an interview study with 30 experts at Apple that specialize in producing efficient models. We compile tacit knowledge that experts have developed through practical experience with model compression across different hardware platforms. Our findings offer pragmatic considerations missing from prior work, covering the design process, trade-offs, and technical strategies that go into creating efficient models. Finally, we distill design recommendations for tooling to help ease the difficulty of this work and bring on-device ML into to more widespread practice.

Via

Access Paper or Ask Questions

Neo: Generalizing Confusion Matrix Visualization to Hierarchical and Multi-Output Labels

Oct 24, 2021

Jochen Görtler, Fred Hohman, Dominik Moritz, Kanit Wongsuphasawat, Donghao Ren, Rahul Nair, Marc Kirchner, Kayur Patel

Figure 1 for Neo: Generalizing Confusion Matrix Visualization to Hierarchical and Multi-Output Labels

Figure 2 for Neo: Generalizing Confusion Matrix Visualization to Hierarchical and Multi-Output Labels

Figure 3 for Neo: Generalizing Confusion Matrix Visualization to Hierarchical and Multi-Output Labels

Figure 4 for Neo: Generalizing Confusion Matrix Visualization to Hierarchical and Multi-Output Labels

Abstract:The confusion matrix, a ubiquitous visualization for helping people evaluate machine learning models, is a tabular layout that compares predicted class labels against actual class labels over all data instances. We conduct formative research with machine learning practitioners at a large technology company and find that conventional confusion matrices do not support more complex data-structures found in modern-day applications, such as hierarchical and multi-output labels. To express such variations of confusion matrices, we design an algebra that models confusion matrices as probability distributions. Based on this algebra, we develop Neo, a visual analytics system that enables practitioners to flexibly author and interact with hierarchical and multi-output confusion matrices, visualize derived metrics, renormalize confusions, and share matrix specifications. Finally, we demonstrate Neo's utility with three case studies that help people better understand model performance and reveal hidden confusions.

Via

Access Paper or Ask Questions