Abstract:While large language models have shown impressive capabilities across a wide range of domains, they still encounter significant challenges in reasoning tasks that require gathering evidence over multiple turns and drawing logical conclusions. These challenges present significant obstacles for LLM chat user interfaces, which rely on multi-turn interactions to facilitate effective collaboration. This limitation leads to real-world issues; for example, service chatbots must gather necessary information from customers over multiple turns to diagnose and resolve problems effectively. Despite the multi-turn nature of many real-world LLM use cases, most existing benchmarks rely on carefully curated single-turn tests, which often blur the line between memorization and genuine reasoning. To address this, we introduce the Wason Inductive Logic Test (WILT), a simple yet challenging multi-turn reasoning benchmark designed to resist memorization. WILT is inspired by the Wason 2-4-6 task, where participants must infer a boolean function involving three variables (e.g., $x < y < z$) by proposing test cases (such as $(2, 4, 6)$). In WILT, each test starts from a clean slate, with only the initial instructions provided, preventing models from relying on pre-learned responses. Over several turns, models must interact with the environment by suggesting test cases to narrow the possible hypotheses and ultimately infer the hidden function based on the outcomes. Our findings reveal that LLMs struggle with this task, exhibiting distinct strengths and weaknesses: some are better at narrowing down the hypothesis space by proposing valuable test cases, while others are more adept at deducing the hidden function from observed cases. Despite these variations, the best-performing model achieves only 28% accuracy, highlighting a significant gap in LLM performance on complex multi-turn reasoning tasks.
Abstract:Rapid non-verbal communication of task-based stimuli is a challenge in human-machine teaming, particularly in closed-loop interactions such as driving. To achieve this, we must understand the representations of information for both the human and machine, and determine a basis for bridging these representations. Techniques of explainable artificial intelligence (XAI) such as layer-wise relevance propagation (LRP) provide visual heatmap explanations for high-dimensional machine learning techniques such as deep neural networks. On the side of human cognition, visual attention is driven by the bottom-up and top-down processing of sensory input related to the current task. Since both XAI and human cognition should focus on task-related stimuli, there may be overlaps between their representations of visual attention, potentially providing a means of nonverbal communication between the human and machine. In this work, we examine the correlations between LRP heatmap explanations of a neural network trained to predict driving behavior and eye gaze heatmaps of human drivers. The analysis is used to determine the feasibility of using such a technique for enhancing driving performance. We find that LRP heatmaps show increasing levels of similarity with eye gaze according to the task specificity of the neural network. We then propose how these findings may assist humans by visually directing attention towards relevant areas. To our knowledge, our work provides the first known analysis of LRP and eye gaze for driving tasks.
Abstract:Neuromorphic computing is a promising solution for reducing the size, weight and power of mobile embedded systems. In this paper, we introduce a realization of such a system by creating the first closed-loop battery-powered communication system between an IBM TrueNorth NS1e and an autonomous Android-Based Robotics platform. Using this system, we constructed a dataset of path following behavior by manually driving the Android-Based robot along steep mountain trails and recording video frames from the camera mounted on the robot along with the corresponding motor commands. We used this dataset to train a deep convolutional neural network implemented on the TrueNorth NS1e. The NS1e, which was mounted on the robot and powered by the robot's battery, resulted in a self-driving robot that could successfully traverse a steep mountain path in real time. To our knowledge, this represents the first time the TrueNorth NS1e neuromorphic chip has been embedded on a mobile platform under closed-loop control.