Abstract:Artificial intelligence (AI) has rapidly developed through advancements in computational power and the growth of massive datasets. However, this progress has also heightened challenges in interpreting the "black-box" nature of AI models. To address these concerns, eXplainable AI (XAI) has emerged with a focus on transparency and interpretability to enhance human understanding and trust in AI decision-making processes. In the context of multimodal data fusion and complex reasoning scenarios, the proposal of Multimodal eXplainable AI (MXAI) integrates multiple modalities for prediction and explanation tasks. Meanwhile, the advent of Large Language Models (LLMs) has led to remarkable breakthroughs in natural language processing, yet their complexity has further exacerbated the issue of MXAI. To gain key insights into the development of MXAI methods and provide crucial guidance for building more transparent, fair, and trustworthy AI systems, we review the MXAI methods from a historical perspective and categorize them across four eras: traditional machine learning, deep learning, discriminative foundation models, and generative LLMs. We also review evaluation metrics and datasets used in MXAI research, concluding with a discussion of future challenges and directions. A project related to this review has been created at https://github.com/ShilinSun/mxai_review.
Abstract:The precise and safe control of heavy material handling machines presents numerous challenges due to the hard-to-model hydraulically actuated joints and the need for collision-free trajectory planning with a free-swinging end-effector tool. In this work, we propose an RL-based controller that commands the cabin joint and the arm simultaneously. It is trained in a simulation combining data-driven modeling techniques with first-principles modeling. On the one hand, we employ a neural network model to capture the highly nonlinear dynamics of the upper carriage turn hydraulic motor, incorporating explicit pressure prediction to handle delays better. On the other hand, we model the arm as velocity-controllable and the free-swinging end-effector tool as a damped pendulum using first principles. This combined model enhances our simulation environment, enabling the training of RL controllers that can be directly transferred to the real machine. Designed to reach steady-state Cartesian targets, the RL controller learns to leverage the hydraulic dynamics to improve accuracy, maintain high speeds, and minimize end-effector tool oscillations. Our controller, tested on a mid-size prototype material handler, is more accurate than an inexperienced operator and causes fewer tool oscillations. It demonstrates competitive performance even compared to an experienced professional driver.
Abstract:Automation of hydraulic material handling machinery is currently limited to semi-static pick-and-place cycles. Dynamic throwing motions which utilize the passive joints, can greatly improve time efficiency as well as increase the dumping workspace. In this work, we use Reinforcement Learning (RL) to design dynamic controllers for material handlers with underactuated arms as commonly used in logistics. The controllers are tested both in simulation and in real-world experiments on a 12-ton test platform. The method is able to exploit the passive joints of the gripper to perform dynamic throwing motions. With the proposed controllers, the machine is able to throw individual objects to targets outside the static reachability zone with good accuracy for its practical applications. The work demonstrates the possibility of using RL to perform highly dynamic tasks with heavy machinery, suggesting a potential for improving the efficiency and precision of autonomous material handling tasks.
Abstract:Autonomous navigation across unstructured terrains, including forests and construction areas, faces unique challenges due to intricate obstacles and the element of the unknown. Lacking pre-existing maps, these scenarios necessitate a motion planning approach that combines agility with efficiency. Critically, it must also incorporate the robot's kinematic constraints to navigate more effectively through complex environments. This work introduces a novel planning method for center-articulated vehicles (CAV), leveraging motion primitives within a receding horizon planning framework using onboard sensing. The approach commences with the offline creation of motion primitives, generated through forward simulations that reflect the distinct kinematic model of center-articulated vehicles. These primitives undergo evaluation through a heuristic-based scoring function, facilitating the selection of the most suitable path for real-time navigation. To augment this planning process, we develop a pose-stabilizing controller, tailored to the kinematic specifications of center-articulated vehicles. During experiments, our method demonstrates a $67\%$ improvement in SPL (Success Rate weighted by Path Length) performance over existing strategies. Furthermore, its efficacy was validated through real-world experiments conducted with a tree harvester vehicle - SAHA.
Abstract:Cross-modal retrieval (CMR) aims to establish interaction between different modalities, among which supervised CMR is emerging due to its flexibility in learning semantic category discrimination. Despite the remarkable performance of previous supervised CMR methods, much of their success can be attributed to the well-annotated data. However, even for unimodal data, precise annotation is expensive and time-consuming, and it becomes more challenging with the multimodal scenario. In practice, massive multimodal data are collected from the Internet with coarse annotation, which inevitably introduces noisy labels. Training with such misleading labels would bring two key challenges -- enforcing the multimodal samples to \emph{align incorrect semantics} and \emph{widen the heterogeneous gap}, resulting in poor retrieval performance. To tackle these challenges, this work proposes UOT-RCL, a Unified framework based on Optimal Transport (OT) for Robust Cross-modal Retrieval. First, we propose a semantic alignment based on partial OT to progressively correct the noisy labels, where a novel cross-modal consistent cost function is designed to blend different modalities and provide precise transport cost. Second, to narrow the discrepancy in multi-modal data, an OT-based relation alignment is proposed to infer the semantic-level cross-modal matching. Both of these two components leverage the inherent correlation among multi-modal data to facilitate effective cost function. The experiments on three widely-used cross-modal retrieval datasets demonstrate that our UOT-RCL surpasses the state-of-the-art approaches and significantly improves the robustness against noisy labels.
Abstract:The mechanical simplicity, hover capabilities, and high agility of quadrotors lead to a fast adaption in the industry for inspection, exploration, and urban aerial mobility. On the other hand, the unstable and underactuated dynamics of quadrotors render them highly susceptible to system faults, especially rotor failures. In this work, we propose a fault-tolerant controller using the nonlinear model predictive control (NMPC) to stabilize and control a quadrotor subjected to the complete failure of a single rotor. Differently from existing works that either rely on linear assumptions or resort to cascaded structures neglecting input constraints in the outer-loop, our method leverages full nonlinear dynamics of the damaged quadrotor and considers the thrust constraint of each rotor. Hence, this method can seamlessly transition from nominal to rotor failure flights, and effectively perform upset recovery from extreme initial conditions. Extensive simulations and real-world experiments are conducted for validation, which demonstrates that the proposed NMPC method can effectively recover the damaged quadrotor even if the failure occurs during aggressive maneuvers, such as flipping and tracking agile trajectories.