Abstract:Hallucinations in large vision-language models (LVLMs) are a significant challenge, i.e., generating objects that are not presented in the visual input, which impairs their reliability. Recent studies often attribute hallucinations to a lack of understanding of visual input, yet ignore a more fundamental issue: the model's inability to effectively extract or decouple visual features. In this paper, we revisit the hallucinations in LVLMs from an architectural perspective, investigating whether the primary cause lies in the visual encoder (feature extraction) or the modal alignment module (feature decoupling). Motivated by our findings on the preliminary investigation, we propose a novel tuning strategy, PATCH, to mitigate hallucinations in LVLMs. This plug-and-play method can be integrated into various LVLMs, utilizing adaptive virtual tokens to extract object features from bounding boxes, thereby addressing hallucinations caused by insufficient decoupling of visual features. PATCH achieves state-of-the-art performance on multiple multi-modal hallucination datasets. We hope this approach provides researchers with deeper insights into the underlying causes of hallucinations in LVLMs, fostering further advancements and innovation in this field.
Abstract:Motor babbling and goal babbling has been used for sensorimotor learning of highly redundant systems in soft robotics. Recent works in goal babbling has demonstrated successful learning of inverse kinematics (IK) on such systems, and suggests that babbling in the goal space better resolves motor redundancy by learning as few sensorimotor mapping as possible. However, for musculoskeletal robot systems, motor redundancy can be of useful information to explain muscle activation patterns, thus the term motor abundance. In this work, we introduce some simple heuristics to empirically define the unknown goal space, and learn the inverse kinematics of a 10 DoF musculoskeletal robot arm using directed goal babbling. We then further propose local online motor babbling using Covariance Matrix Adaptation Evolution Strategy (CMA-ES), which bootstraps on the collected samples in goal babbling for initialization, such that motor abundance can be queried for any static goal within the defined goal space. The result shows that our motor babbling approach can efficiently explore motor abundance, and gives useful insights in terms of muscle stiffness and synergy.
Abstract:In quadruped gait learning, policy search methods that scale high dimensional continuous action spaces are commonly used. In most approaches, it is necessary to introduce prior knowledge on the gaits to limit the highly non-convex search space of the policies. In this work, we propose a new approach to encode the symmetry properties of the desired gaits, on the initial covariance of the Gaussian search distribution, allowing for strategic exploration. Using episode-based likelihood ratio policy gradient and relative entropy policy search, we learned the gaits walk and trot on a simulated quadruped. Comparing these gaits to random gaits learned by initialized diagonal covariance matrix, we show that the performance can be significantly enhanced.