Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yaohui Chen

SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding

May 22, 2025

Haoning Wu, Xiao Huang, Yaohui Chen, Ya Zhang, Yanfeng Wang, Weidi Xie

Abstract:Multimodal large language models (MLLMs) have achieved impressive success in question-answering tasks, yet their capabilities for spatial understanding are less explored. This work investigates a critical question: do existing MLLMs possess 3D spatial perception and understanding abilities? Concretely, we make the following contributions in this paper: (i) we introduce VGBench, a benchmark specifically designed to assess MLLMs for visual geometry perception, e.g., camera pose and motion estimation; (ii) we propose SpatialScore, the most comprehensive and diverse multimodal spatial understanding benchmark to date, integrating VGBench with relevant data from the other 11 existing datasets. This benchmark comprises 28K samples across various spatial understanding tasks, modalities, and QA formats, along with a carefully curated challenging subset, SpatialScore-Hard; (iii) we develop SpatialAgent, a novel multi-agent system incorporating 9 specialized tools for spatial understanding, supporting both Plan-Execute and ReAct reasoning paradigms; (iv) we conduct extensive evaluations to reveal persistent challenges in spatial reasoning while demonstrating the effectiveness of SpatialAgent. We believe SpatialScore will offer valuable insights and serve as a rigorous benchmark for the next evolution of MLLMs.

* Technical Report; Project Page: https://haoningwu3639.github.io/SpatialScore

Via

Access Paper or Ask Questions

TSGaussian: Semantic and Depth-Guided Target-Specific Gaussian Splatting from Sparse Views

Dec 13, 2024

Liang Zhao, Zehan Bao, Yi Xie, Hong Chen, Yaohui Chen, Weifu Li

Figure 1 for TSGaussian: Semantic and Depth-Guided Target-Specific Gaussian Splatting from Sparse Views

Figure 2 for TSGaussian: Semantic and Depth-Guided Target-Specific Gaussian Splatting from Sparse Views

Figure 3 for TSGaussian: Semantic and Depth-Guided Target-Specific Gaussian Splatting from Sparse Views

Figure 4 for TSGaussian: Semantic and Depth-Guided Target-Specific Gaussian Splatting from Sparse Views

Abstract:Recent advances in Gaussian Splatting have significantly advanced the field, achieving both panoptic and interactive segmentation of 3D scenes. However, existing methodologies often overlook the critical need for reconstructing specified targets with complex structures from sparse views. To address this issue, we introduce TSGaussian, a novel framework that combines semantic constraints with depth priors to avoid geometry degradation in challenging novel view synthesis tasks. Our approach prioritizes computational resources on designated targets while minimizing background allocation. Bounding boxes from YOLOv9 serve as prompts for Segment Anything Model to generate 2D mask predictions, ensuring semantic accuracy and cost efficiency. TSGaussian effectively clusters 3D gaussians by introducing a compact identity encoding for each Gaussian ellipsoid and incorporating 3D spatial consistency regularization. Leveraging these modules, we propose a pruning strategy to effectively reduce redundancy in 3D gaussians. Extensive experiments demonstrate that TSGaussian outperforms state-of-the-art methods on three standard datasets and a new challenging dataset we collected, achieving superior results in novel view synthesis of specific objects. Code is available at: https://github.com/leon2000-ai/TSGaussian.

Via

Access Paper or Ask Questions

CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models

Apr 19, 2024

Manish Bhatt, Sahana Chennabasappa, Yue Li, Cyrus Nikolaidis, Daniel Song, Shengye Wan, Faizan Ahmad, Cornelius Aschermann, Yaohui Chen, Dhaval Kapil(+3 more)

Figure 1 for CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models

Figure 2 for CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models

Figure 3 for CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models

Figure 4 for CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models

Abstract:Large language models (LLMs) introduce new security risks, but there are few comprehensive evaluation suites to measure and reduce these risks. We present BenchmarkName, a novel benchmark to quantify LLM security risks and capabilities. We introduce two new areas for testing: prompt injection and code interpreter abuse. We evaluated multiple state-of-the-art (SOTA) LLMs, including GPT-4, Mistral, Meta Llama 3 70B-Instruct, and Code Llama. Our results show that conditioning away risk of attack remains an unsolved problem; for example, all tested models showed between 26% and 41% successful prompt injection tests. We further introduce the safety-utility tradeoff: conditioning an LLM to reject unsafe prompts can cause the LLM to falsely reject answering benign prompts, which lowers utility. We propose quantifying this tradeoff using False Refusal Rate (FRR). As an illustration, we introduce a novel test set to quantify FRR for cyberattack helpfulness risk. We find many LLMs able to successfully comply with "borderline" benign requests while still rejecting most unsafe requests. Finally, we quantify the utility of LLMs for automating a core cybersecurity task, that of exploiting software vulnerabilities. This is important because the offensive capabilities of LLMs are of intense interest; we quantify this by creating novel test sets for four representative problems. We find that models with coding capabilities perform better than those without, but that further work is needed for LLMs to become proficient at exploit generation. Our code is open source and can be used to evaluate other LLMs.

Via

Access Paper or Ask Questions

Compact 3D Gaussian Splatting For Dense Visual SLAM

Mar 17, 2024

Tianchen Deng, Yaohui Chen, Leyan Zhang, Jianfei Yang, Shenghai Yuan, Danwei Wang, Weidong Chen

Figure 1 for Compact 3D Gaussian Splatting For Dense Visual SLAM

Figure 2 for Compact 3D Gaussian Splatting For Dense Visual SLAM

Figure 3 for Compact 3D Gaussian Splatting For Dense Visual SLAM

Figure 4 for Compact 3D Gaussian Splatting For Dense Visual SLAM

Abstract:Recent work has shown that 3D Gaussian-based SLAM enables high-quality reconstruction, accurate pose estimation, and real-time rendering of scenes. However, these approaches are built on a tremendous number of redundant 3D Gaussian ellipsoids, leading to high memory and storage costs, and slow training speed. To address the limitation, we propose a compact 3D Gaussian Splatting SLAM system that reduces the number and the parameter size of Gaussian ellipsoids. A sliding window-based masking strategy is first proposed to reduce the redundant ellipsoids. Then we observe that the covariance matrix (geometry) of most 3D Gaussian ellipsoids are extremely similar, which motivates a novel geometry codebook to compress 3D Gaussian geometric attributes, i.e., the parameters. Robust and accurate pose estimation is achieved by a global bundle adjustment method with reprojection loss. Extensive experiments demonstrate that our method achieves faster training and rendering speed while maintaining the state-of-the-art (SOTA) quality of the scene representation.

Via

Access Paper or Ask Questions

A Lobster-inspired Hybrid Actuator With Rigid and Soft Components

Mar 01, 2020

Yaohui Chen, Sing Le, Qiao Chu Tan, Oscar Lau, Chaoyang Song

Figure 1 for A Lobster-inspired Hybrid Actuator With Rigid and Soft Components

Figure 2 for A Lobster-inspired Hybrid Actuator With Rigid and Soft Components

Figure 3 for A Lobster-inspired Hybrid Actuator With Rigid and Soft Components

Figure 4 for A Lobster-inspired Hybrid Actuator With Rigid and Soft Components

Abstract:Soft actuators have drawn significant attention from researchers with an inherently compliant design to address the safety issues in physical human-robot interactions. However, they are also vulnerable and pose new challenges in the design, fabrication, and analysis due to their inherent material softness. In this paper, a novel hybrid actuator design is presented with bio-inspirations from the lobster, or crustaceans in a broader perspective. We enclose a soft chamber with rectangular cross-section using a series of articulated rigid shells to produce bending under pneumatic input. By mimicking the shell pattern of lobsters' abdomen, foldable rigid shells are designed to provide the soft actuator with full protection throughout the motion range. The articulation of the rigid shells predefines the actuator's bending motions. As a result, the proposed design enables one to analyze this hybrid actuator with simplified quasi-static models and rigid-body kinematics, which are further validated by mechanical tests. This paper demonstrates that the proposed hybrid actuator design is capable of bridging the major design drawbacks of the entirely rigid and soft robots while preserving their engineering merits in performance.

* 9 pages, 7 figures, accepted for ASME DETC 2017

Via

Access Paper or Ask Questions

A Reconfigurable Hybrid Actuator with Rigid and Soft Components

Mar 01, 2020

Yaohui Chen, Sing Le, Qiao Chu Tan, Oscar Lau, Fang Wan, Chaoyang Song

Figure 1 for A Reconfigurable Hybrid Actuator with Rigid and Soft Components

Figure 2 for A Reconfigurable Hybrid Actuator with Rigid and Soft Components

Figure 3 for A Reconfigurable Hybrid Actuator with Rigid and Soft Components

Figure 4 for A Reconfigurable Hybrid Actuator with Rigid and Soft Components

Abstract:Classical rigid-bodied robotic systems are presented with proven success in theoretical development and industrial applications, are recently challenged by the emergence of soft robotics due to a growing need in physical human-robot interactions (pHRI), such as wearable devices, medical robots, personal robots, etc. In this paper, we present the design and fabrication of a robust, hybrid bending actuator build from both rigid and soft components inspired by crustaceans, where its bending radius and axis can be mechanically programmed through the selective activation of the rigid exterior joints, actuated by the soft actuators inside. The hybrid actuator was experimentally measured in terms of bending and force tests to demonstrate the utility of this design. Finally, a case study was presented to demonstrate its capacity to adapt to specific objects geometry, anticipating its potential application in situations where compliance is the priority.

* 6 pages, 9 figures, accepted for IEEE ICRA 2017

Via

Access Paper or Ask Questions

A Lobster-inspired Robotic Glove for Hand Rehabilitation

Mar 01, 2020

Yaohui Chen, Sing Le, Qiao Chu Tan, Oscar Lau, Fang Wan, Chaoyang Song

Figure 1 for A Lobster-inspired Robotic Glove for Hand Rehabilitation

Figure 2 for A Lobster-inspired Robotic Glove for Hand Rehabilitation

Figure 3 for A Lobster-inspired Robotic Glove for Hand Rehabilitation

Figure 4 for A Lobster-inspired Robotic Glove for Hand Rehabilitation

Abstract:This paper presents preliminary results of the design, development, and evaluation of a hand rehabilitation glove fabricated using lobster-inspired hybrid design with rigid and soft components for actuation. Inspired by the bending abdomen of lobsters, hybrid actuators are built with serially jointed rigid shells actuated by pressurized soft chambers inside to generate bending motions. Such bio-inspiration absorbs features from the classical rigid-bodied robotics with precisely-defined motion generation, as well as the emerging soft robotics with light-weight, physically safe, and adaptive actuation. The fabrication procedure is described, followed by experiments to mechanically characterize these actuators. Finally, an open-palm glove design integrated with these hybrid actuators is presented for a qualitative case study. A hand rehabilitation system is developed by learning patterns of the sEMG signals from the user's forearm to train the assistive glove for hand rehabilitation exercises.

* 6 pages, 8 figures, accepted for IEEE ICRA 2017

Via

Access Paper or Ask Questions

MEUZZ: Smart Seed Scheduling for Hybrid Fuzzing

Feb 20, 2020

Yaohui Chen, Mansour Ahmadi, Reza Mirzazade farkhani, Boyu Wang, Long Lu

Figure 1 for MEUZZ: Smart Seed Scheduling for Hybrid Fuzzing

Figure 2 for MEUZZ: Smart Seed Scheduling for Hybrid Fuzzing

Figure 3 for MEUZZ: Smart Seed Scheduling for Hybrid Fuzzing

Figure 4 for MEUZZ: Smart Seed Scheduling for Hybrid Fuzzing

Abstract:Seed scheduling is a prominent factor in determining the yields of hybrid fuzzing. Existing hybrid fuzzers schedule seeds based on fixed heuristics that aim to predict input utilities. However, such heuristics are not generalizable as there exists no one-size-fits-all rule applicable to different programs. They may work well on the programs from which they were derived, but not others. To overcome this problem, we design a Machine learning-Enhanced hybrid fUZZing system (MEUZZ), which employs supervised machine learning for adaptive and generalizable seed scheduling. MEUZZ determines which new seeds are expected to produce better fuzzing yields based on the knowledge learned from past seed scheduling decisions made on the same or similar programs. MEUZZ's learning is based on a series of features extracted via code reachability and dynamic analysis, which incurs negligible runtime overhead (in microseconds). Moreover, MEUZZ automatically infers the data labels by evaluating the fuzzing performance of each selected seed. As a result, MEUZZ is generally applicable to, and performs well on, various kinds of programs. Our evaluation shows MEUZZ significantly outperforms the state-of-the-art grey-box and hybrid fuzzers, achieving 27.1% more code coverage than QSYM. The learned models are reusable and transferable, which boosts fuzzing performance by 7.1% on average and improves 68% of the 56 cross-program fuzzing campaigns. MEUZZ discovered 47 deeply hidden and previously unknown bugs--with 21 confirmed and fixed by the developers--when fuzzing 8 well-tested programs with the same configurations as used in previous work.

Via

Access Paper or Ask Questions