Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zixuan Wu

PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models

Feb 03, 2025

Carolyn Jane Anderson, Joydeep Biswas, Aleksander Boruch-Gruszecki, Federico Cassano, Molly Q Feldman, Arjun Guha, Francesca Lucchetti, Zixuan Wu

Figure 1 for PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models

Figure 2 for PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models

Figure 3 for PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models

Figure 4 for PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models

Abstract:Existing benchmarks for frontier models often test specialized, ``PhD-level'' knowledge that is difficult for non-experts to grasp. In contrast, we present a benchmark based on the NPR Sunday Puzzle Challenge that requires only general knowledge. Our benchmark is challenging for both humans and models, however correct solutions are easy to verify, and models' mistakes are easy to spot. Our work reveals capability gaps that are not evident in existing benchmarks: OpenAI o1 significantly outperforms other reasoning models that are on par on benchmarks that test specialized knowledge. Furthermore, our analysis of reasoning outputs uncovers new kinds of failures. DeepSeek R1, for instance, often concedes with ``I give up'' before providing an answer that it knows is wrong. R1 can also be remarkably ``uncertain'' in its output and in rare cases, it does not ``finish thinking,'' which suggests the need for an inference-time technique to ``wrap up'' before the context window limit is reached. We also quantify the effectiveness of reasoning longer with R1 and Gemini Thinking to identify the point beyond which more reasoning is unlikely to improve accuracy on our benchmark.

Via

Access Paper or Ask Questions

Learning Wheelchair Tennis Navigation from Broadcast Videos with Domain Knowledge Transfer and Diffusion Motion Planning

Sep 29, 2024

Zixuan Wu, Zulfiqar Zaidi, Adithya Patil, Qingyu Xiao, Matthew Gombolay

Figure 1 for Learning Wheelchair Tennis Navigation from Broadcast Videos with Domain Knowledge Transfer and Diffusion Motion Planning

Figure 2 for Learning Wheelchair Tennis Navigation from Broadcast Videos with Domain Knowledge Transfer and Diffusion Motion Planning

Figure 3 for Learning Wheelchair Tennis Navigation from Broadcast Videos with Domain Knowledge Transfer and Diffusion Motion Planning

Figure 4 for Learning Wheelchair Tennis Navigation from Broadcast Videos with Domain Knowledge Transfer and Diffusion Motion Planning

Abstract:In this paper, we propose a novel and generalizable zero-shot knowledge transfer framework that distills expert sports navigation strategies from web videos into robotic systems with adversarial constraints and out-of-distribution image trajectories. Our pipeline enables diffusion-based imitation learning by reconstructing the full 3D task space from multiple partial views, warping it into 2D image space, closing the planning loop within this 2D space, and transfer constrained motion of interest back to task space. Additionally, we demonstrate that the learned policy can serve as a local planner in conjunction with position control. We apply this framework in the wheelchair tennis navigation problem to guide the wheelchair into the ball-hitting region. Our pipeline achieves a navigation success rate of 97.67% in reaching real-world recorded tennis ball trajectories with a physical robot wheelchair, and achieve a success rate of 68.49% in a real-world, real-time experiment on a full-sized tennis court.

* This manuscript has been submitted to 2025 IEEE International Conference on Robotics & Automation (ICRA)

Via

Access Paper or Ask Questions

Learning Dynamics of a Ball with Differentiable Factor Graph and Roto-Translational Invariant Representations

Sep 24, 2024

Qingyu Xiao, Zixuan Wu, Matthew Gombolay

Figure 1 for Learning Dynamics of a Ball with Differentiable Factor Graph and Roto-Translational Invariant Representations

Figure 2 for Learning Dynamics of a Ball with Differentiable Factor Graph and Roto-Translational Invariant Representations

Figure 3 for Learning Dynamics of a Ball with Differentiable Factor Graph and Roto-Translational Invariant Representations

Figure 4 for Learning Dynamics of a Ball with Differentiable Factor Graph and Roto-Translational Invariant Representations

Abstract:Robots in dynamic environments need fast, accurate models of how objects move in their environments to support agile planning. In sports such as ping pong, analytical models often struggle to accurately predict ball trajectories with spins due to complex aerodynamics, elastic behaviors, and the challenges of modeling sliding and rolling friction. On the other hand, despite the promise of data-driven methods, machine learning struggles to make accurate, consistent predictions without precise input. In this paper, we propose an end-to-end learning framework that can jointly train a dynamics model and a factor graph estimator. Our approach leverages a Gram-Schmidt (GS) process to extract roto-translational invariant representations to improve the model performance, which can further reduce the validation error compared to data augmentation method. Additionally, we propose a network architecture that enhances nonlinearity by using self-multiplicative bypasses in the layer connections. By leveraging these novel methods, our proposed approach predicts the ball's position with an RMSE of 37.2 mm of the paddle radius at the apex after the first bounce, and 71.5 mm after the second bounce.

* ICRA 2025

Via

Access Paper or Ask Questions

GlyphPattern: An Abstract Pattern Recognition for Vision-Language Models

Aug 12, 2024

Zixuan Wu, Yoolim Kim, Carolyn Jane Anderson

Abstract:Vision-Language Models (VLMs) building upon the foundation of powerful large language models have made rapid progress in reasoning across visual and textual data. While VLMs perform well on vision tasks that they are trained on, our results highlight key challenges in abstract pattern recognition. We present GlyphPattern, a 954 item dataset that pairs 318 human-written descriptions of visual patterns from 40 writing systems with three visual presentation styles. GlyphPattern evaluates abstract pattern recognition in VLMs, requiring models to understand and judge natural language descriptions of visual patterns. GlyphPattern patterns are drawn from a large-scale cognitive science investigation of human writing systems; as a result, they are rich in spatial reference and compositionality. Our experiments show that GlyphPattern is challenging for state-of-the-art VLMs (GPT-4o achieves only 55% accuracy), with marginal gains from few-shot prompting. Our detailed error analysis reveals challenges at multiple levels, including visual processing, natural language understanding, and pattern generalization.

Via

Access Paper or Ask Questions

Diffusion-Reinforcement Learning Hierarchical Motion Planning in Adversarial Multi-agent Games

Mar 16, 2024

Zixuan Wu, Sean Ye, Manisha Natarajan, Matthew C. Gombolay

Abstract:Reinforcement Learning- (RL-)based motion planning has recently shown the potential to outperform traditional approaches from autonomous navigation to robot manipulation. In this work, we focus on a motion planning task for an evasive target in a partially observable multi-agent adversarial pursuit-evasion games (PEG). These pursuit-evasion problems are relevant to various applications, such as search and rescue operations and surveillance robots, where robots must effectively plan their actions to gather intelligence or accomplish mission tasks while avoiding detection or capture themselves. We propose a hierarchical architecture that integrates a high-level diffusion model to plan global paths responsive to environment data while a low-level RL algorithm reasons about evasive versus global path-following behavior. Our approach outperforms baselines by 51.2% by leveraging the diffusion model to guide the RL algorithm for more efficient exploration and improves the explanability and predictability.

* This work has been submitted to the IEEE Robotics and Automation Letters (RA-L) for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions

Diffusion Based Multi-Agent Adversarial Tracking

Jul 12, 2023

Sean Ye, Manisha Natarajan, Zixuan Wu, Matthew Gombolay

Abstract:Target tracking plays a crucial role in real-world scenarios, particularly in drug-trafficking interdiction, where the knowledge of an adversarial target's location is often limited. Improving autonomous tracking systems will enable unmanned aerial, surface, and underwater vehicles to better assist in interdicting smugglers that use manned surface, semi-submersible, and aerial vessels. As unmanned drones proliferate, accurate autonomous target estimation is even more crucial for security and safety. This paper presents Constrained Agent-based Diffusion for Enhanced Multi-Agent Tracking (CADENCE), an approach aimed at generating comprehensive predictions of adversary locations by leveraging past sparse state information. To assess the effectiveness of this approach, we evaluate predictions on single-target and multi-target pursuit environments, employing Monte-Carlo sampling of the diffusion model to estimate the probability associated with each generated trajectory. We propose a novel cross-attention based diffusion model that utilizes constraint-based sampling to generate multimodal track hypotheses. Our single-target model surpasses the performance of all baseline methods on Average Displacement Error (ADE) for predictions across all time horizons.

Via

Access Paper or Ask Questions

Learning Models of Adversarial Agent Behavior under Partial Observability

Jul 05, 2023

Sean Ye, Manisha Natarajan, Zixuan Wu, Rohan Paleja, Letian Chen, Matthew C. Gombolay

Abstract:The need for opponent modeling and tracking arises in several real-world scenarios, such as professional sports, video game design, and drug-trafficking interdiction. In this work, we present Graph based Adversarial Modeling with Mutal Information (GrAMMI) for modeling the behavior of an adversarial opponent agent. GrAMMI is a novel graph neural network (GNN) based approach that uses mutual information maximization as an auxiliary objective to predict the current and future states of an adversarial opponent with partial observability. To evaluate GrAMMI, we design two large-scale, pursuit-evasion domains inspired by real-world scenarios, where a team of heterogeneous agents is tasked with tracking and interdicting a single adversarial agent, and the adversarial agent must evade detection while achieving its own objectives. With the mutual information formulation, GrAMMI outperforms all baselines in both domains and achieves 31.68% higher log-likelihood on average for future adversarial state predictions across both domains.

* 8 pages, 3 figures, 2 tables

Via

Access Paper or Ask Questions

Adversarial Search and Track with Multiagent Reinforcement Learning in Sparsely Observable Environment

Jun 20, 2023

Zixuan Wu, Sean Ye, Manisha Natarajan, Letian Chen, Rohan Paleja, Matthew C. Gombolay

Abstract:We study a search and tracking (S&T) problem for a team of dynamic search agents to capture an adversarial evasive agent with only sparse temporal and spatial knowledge of its location in this paper. The domain is challenging for traditional Reinforcement Learning (RL) approaches as the large space leads to sparse observations of the adversary and in turn sparse rewards for the search agents. Additionally, the opponent's behavior is reactionary to the search agents, which causes a data distribution shift for RL during training as search agents improve their policies. We propose a differentiable Multi-Agent RL (MARL) architecture that utilizes a novel filtering module to supplement estimated adversary location information and enables the effective learning of a team policy. Our algorithm learns how to balance information from prior knowledge and a motion model to remain resilient to the data distribution shift and outperforms all baseline methods with a 46% increase of detection rate.

* Submitted to IEEE/RSJ International Conference on Intelligent Robots (IROS) 2023

Via

Access Paper or Ask Questions

Trajectory Servoing: Image-Based Trajectory Tracking Using SLAM

Mar 06, 2021

Shiyu Feng, Zixuan Wu, Yipu Zhao, Patricio A. Vela

Figure 1 for Trajectory Servoing: Image-Based Trajectory Tracking Using SLAM

Figure 2 for Trajectory Servoing: Image-Based Trajectory Tracking Using SLAM

Figure 3 for Trajectory Servoing: Image-Based Trajectory Tracking Using SLAM

Figure 4 for Trajectory Servoing: Image-Based Trajectory Tracking Using SLAM

Abstract:This paper describes an image based visual servoing (IBVS) system for a nonholonomic robot to achieve good trajectory following without real-time robot pose information and without a known visual map of the environment. We call it trajectory servoing. The critical component is a feature-based, indirect SLAM method to provide a pool of available features with estimated depth, so that they may be propagated forward in time to generate image feature trajectories for visual servoing. Short and long distance experiments show the benefits of trajectory servoing for navigating unknown areas without absolute positioning. Trajectory servoing is shown to be more accurate than pose-based feedback when both rely on the same underlying SLAM system.

Via

Access Paper or Ask Questions

Revisiting SGD with Increasingly Weighted Averaging: Optimization and Generalization Perspectives

Mar 09, 2020

Zhishuai Guo, Zixuan Wu, Yan Yan, Xiaoyu Wang, Tianbao Yang

Figure 1 for Revisiting SGD with Increasingly Weighted Averaging: Optimization and Generalization Perspectives

Figure 2 for Revisiting SGD with Increasingly Weighted Averaging: Optimization and Generalization Perspectives

Figure 3 for Revisiting SGD with Increasingly Weighted Averaging: Optimization and Generalization Perspectives

Abstract:Stochastic gradient descent (SGD) has been widely studied in the literature from different angles, and is commonly employed for solving many big data machine learning problems. However, the averaging technique, which combines all iterative solutions into a single solution, is still under-explored. While some increasingly weighted averaging schemes have been considered in the literature, existing works are mostly restricted to strongly convex objective functions and the convergence of optimization error. It remains unclear how these averaging schemes affect the convergence of {\it both optimization error and generalization error} (two equally important components of testing error) for {\bf non-strongly convex objectives, including non-convex problems}. In this paper, we {\it fill the gap} by comprehensively analyzing the increasingly weighted averaging on convex, strongly convex and non-convex objective functions in terms of both optimization error and generalization error. In particular, we analyze a family of increasingly weighted averaging, where the weight for the solution at iteration $t$ is proportional to $t^{\alpha}$ ($\alpha > 0$). We show how $\alpha$ affects the optimization error and the generalization error, and exhibit the trade-off caused by $\alpha$. Experiments have demonstrated this trade-off and the effectiveness of polynomially increased weighted averaging compared with other averaging schemes for a wide range of problems including deep learning.

Via

Access Paper or Ask Questions