Multiscale Medical Robotics Center, Hong Kong, China
Abstract:Multimodal large language models (MLLMs) contribute a powerful mechanism to understanding visual information building on large language models. However, MLLMs are notorious for suffering from hallucinations, especially when generating lengthy, detailed descriptions for images. Our analysis reveals that hallucinations stem from the inherent summarization mechanism of large language models, leading to excessive dependence on linguistic tokens while neglecting vision information. In this paper, we propose NoiseBoost, a broadly applicable and simple method for alleviating hallucinations for MLLMs through the integration of noise feature perturbations. Noise perturbation acts as a regularizer, facilitating a balanced distribution of attention weights among visual and linguistic tokens. Despite its simplicity, NoiseBoost consistently enhances the performance of MLLMs across common training strategies, including supervised fine-tuning and reinforcement learning. Further, NoiseBoost pioneerly enables semi-supervised learning for MLLMs, unleashing the power of unlabeled data. Comprehensive experiments demonstrate that NoiseBoost improves dense caption accuracy by 8.1% with human evaluation and achieves comparable results with 50% of the data by mining unlabeled data. Code and models are available at https://kaiwu5.github.io/noiseboost.
Abstract:Given a graph, the minimum dominating set (MinDS) problem is to identify a smallest set $D$ of vertices such that every vertex not in $D$ is adjacent to at least one vertex in $D$. The MinDS problem is a classic $\mathcal{NP}$-hard problem and has been extensively studied because of its many disparate applications in network analysis. To solve this problem efficiently, many heuristic approaches have been proposed to obtain a good solution within an acceptable time limit. However, existing MinDS heuristic algorithms are always limited by various tie-breaking cases when selecting vertices, which slows down the effectiveness of the algorithms. In this paper, we design an efficient local search algorithm for the MinDS problem, named DmDS -- a dual-mode local search framework that probabilistically chooses between two distinct vertex-swapping schemes. We further address limitations of other algorithms by introducing vertex selection criterion based on the frequency of vertices added to solutions to address tie-breaking cases, and a new strategy to improve the quality of the initial solution via a greedy-based strategy integrated with perturbation. We evaluate DmDS against the state-of-the-art algorithms on seven datasets, consisting of 346 instances (or families) with up to tens of millions of vertices. Experimental results show that DmDS obtains the best performance in accuracy for almost all instances and finds much better solutions than state-of-the-art MinDS algorithms on a broad range of large real-world graphs.
Abstract:Projected Inverse Dynamics Control (PIDC) is commonly used in robots subject to contact, especially in quadrupedal systems. Many methods based on such dynamics have been developed for quadrupedal locomotion tasks, and only a few works studied simple interactions between the robot and environment, such as pressing an E-stop button. To facilitate the interaction requiring exact force control for safety, we propose a novel interaction force control scheme for underactuated quadrupedal systems relying on projection techniques and Quadratic Programming (QP). This algorithm allows the robot to apply a desired interaction force to the environment without using force sensors while satisfying physical constraints and inducing minimal base motion. Unlike previous projection-based methods, the QP design uses two selection matrices in its hierarchical structure, facilitating the decoupling between force and motion control. The proposed algorithm is verified with a quadrupedal robot in a high-fidelity simulator. Compared to the QP designs without the strategy of using two selection matrices and the PIDC method for contact force control, our method provided more accurate contact force tracking performance with minimal base movement, paving the way to approach the exact interaction force control for underactuated quadrupedal systems.
Abstract:Falling cat problem is well-known where cats show their super aerial reorientation capability and can land safely. For their robotic counterparts, a similar falling quadruped robot problem, has not been fully addressed, although achieving safe landing as the cats has been increasingly investigated. Unlike imposing the burden on landing control, we approach to safe landing of falling quadruped robots by effective flight phase control. Different from existing work like swinging legs and attaching reaction wheels or simple tails, we propose to deploy a 3-DoF morphable inertial tail on a medium-size quadruped robot. In the flight phase, the tail with its maximum length can self-right the body orientation in 3D effectively; before touch-down, the tail length can be retracted to about 1/4 of its maximum for impressing the tail's side-effect on landing. To enable aerial reorientation for safe landing in the quadruped robots, we design a control architecture, which has been verified in a high-fidelity physics simulation environment with different initial conditions. Experimental results on a customized flight-phase test platform with comparable inertial properties are provided and show the tail's effectiveness on 3D body reorientation and its fast retractability before touch-down. An initial falling quadruped robot experiment is shown, where the robot Unitree A1 with the 3-DoF tail can land safely subject to non-negligible initial body angles.
Abstract:Self-recognition or self-awareness is a capacity attributed typically only to humans and few other species. The definitions of these concepts vary and little is known about the mechanisms behind them. However, there is a Turing test-like benchmark: the mirror self-recognition, which consists in covertly putting a mark on the face of the tested subject, placing her in front of a mirror, and observing the reactions. In this work, first, we provide a mechanistic decomposition, or process model, of what components are required to pass this test. Based on these, we provide suggestions for empirical research. In particular, in our view, the way the infants or animals reach for the mark should be studied in detail. Second, we develop a model to enable the humanoid robot Nao to pass the test. The core of our technical contribution is learning the appearance representation and visual novelty detection by means of learning the generative model of the face with deep auto-encoders and exploiting the prediction error. The mark is identified as a salient region on the face and reaching action is triggered, relying on a previously learned mapping to arm joint angles. The architecture is tested on two robots with a completely different face.