Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shubham Bharti

CHARTOM: A Visual Theory-of-Mind Benchmark for Multimodal Large Language Models

Aug 26, 2024

Shubham Bharti, Shiyun Cheng, Jihyun Rho, Martina Rao, Xiaojin Zhu

Figure 1 for CHARTOM: A Visual Theory-of-Mind Benchmark for Multimodal Large Language Models

Figure 2 for CHARTOM: A Visual Theory-of-Mind Benchmark for Multimodal Large Language Models

Figure 3 for CHARTOM: A Visual Theory-of-Mind Benchmark for Multimodal Large Language Models

Figure 4 for CHARTOM: A Visual Theory-of-Mind Benchmark for Multimodal Large Language Models

Abstract:We introduce CHARTOM, a visual theory-of-mind benchmark for multimodal large language models. CHARTOM consists of specially designed data visualizing charts. Given a chart, a language model needs to not only correctly comprehend the chart (the FACT question) but also judge if the chart will be misleading to a human reader (the MIND question). Both questions have significant societal benefits. We detail the construction of the CHARTOM benchmark including its calibration on human performance.

Via

Access Paper or Ask Questions

The Game of Hidden Rules: A New Kind of Benchmark Challenge for Machine Learning

Jul 20, 2022

Eric Pulick, Shubham Bharti, Yiding Chen, Vladimir Menkov, Yonatan Mintz, Paul Kantor, Vicki M. Bier

Figure 1 for The Game of Hidden Rules: A New Kind of Benchmark Challenge for Machine Learning

Figure 2 for The Game of Hidden Rules: A New Kind of Benchmark Challenge for Machine Learning

Figure 3 for The Game of Hidden Rules: A New Kind of Benchmark Challenge for Machine Learning

Figure 4 for The Game of Hidden Rules: A New Kind of Benchmark Challenge for Machine Learning

Abstract:As machine learning (ML) is more tightly woven into society, it is imperative that we better characterize ML's strengths and limitations if we are to employ it responsibly. Existing benchmark environments for ML, such as board and video games, offer well-defined benchmarks for progress, but constituent tasks are often complex, and it is frequently unclear how task characteristics contribute to overall difficulty for the machine learner. Likewise, without a systematic assessment of how task characteristics influence difficulty, it is challenging to draw meaningful connections between performance in different benchmark environments. We introduce a novel benchmark environment that offers an enormous range of ML challenges and enables precise examination of how task elements influence practical difficulty. The tool frames learning tasks as a "board-clearing game," which we call the Game of Hidden Rules (GOHR). The environment comprises an expressive rule language and a captive server environment that can be installed locally. We propose a set of benchmark rule-learning tasks and plan to support a performance leader-board for researchers interested in attempting to learn our rules. GOHR complements existing environments by allowing fine, controlled modifications to tasks, enabling experimenters to better understand how each facet of a given learning task contributes to its practical difficulty for an arbitrary ML algorithm.

* 9 pages, 5 figures. Additional documentation information available at http://sapir.psych.wisc.edu:7150/w2020/captive.html

Via

Access Paper or Ask Questions