Abstract:Graph Neural Networks (GNNs) have garnered significant interest among researchers due to their impressive performance in graph learning tasks. However, like other deep neural networks, GNNs are also vulnerable to adversarial attacks. In existing adversarial attack methods for GNNs, the metric between the attacked graph and the original graph is usually the attack budget or a measure of global graph properties. However, we have found that it is possible to generate attack graphs that disrupt the primary semantics even within these constraints. To address this problem, we propose a Adversarial Attacks on High-level Semantics in Graph Neural Networks (AHSG), which is a graph structure attack model that ensures the retention of primary semantics. The latent representations of each node can extract rich semantic information by applying convolutional operations on graph data. These representations contain both task-relevant primary semantic information and task-irrelevant secondary semantic information. The latent representations of same-class nodes with the same primary semantics can fulfill the objective of modifying secondary semantics while preserving the primary semantics. Finally, the latent representations with attack effects is mapped to an attack graph using Projected Gradient Descent (PGD) algorithm. By attacking graph deep learning models with some advanced defense strategies, we validate that AHSG has superior attack effectiveness compared to other attack methods. Additionally, we employ Contextual Stochastic Block Models (CSBMs) as a proxy for the primary semantics to detect the attacked graph, confirming that AHSG almost does not disrupt the original primary semantics of the graph.
Abstract:Background: Large language models (LLMs) have seen extraordinary advances with applications in clinical decision support. However, high-quality evidence is urgently needed on the potential and limitation of LLMs in providing accurate clinical decisions based on real-world medical data. Objective: To evaluate quantitatively whether universal state-of-the-art LLMs (ChatGPT and GPT-4) can predict the incidence risk of myocardial infarction (MI) with logical inference, and to further make comparison between various models to assess the performance of LLMs comprehensively. Methods: In this retrospective cohort study, 482,310 participants recruited from 2006 to 2010 were initially included in UK Biobank database and later on resampled into a final cohort of 690 participants. For each participant, tabular data of the risk factors of MI were transformed into standardized textual descriptions for ChatGPT recognition. Responses were generated by asking ChatGPT to select a score ranging from 0 to 10 representing the risk. Chain of Thought (CoT) questioning was used to evaluate whether LLMs make prediction logically. The predictive performance of ChatGPT was compared with published medical indices, traditional machine learning models and other large language models. Conclusions: Current LLMs are not ready to be applied in clinical medicine fields. Future medical LLMs are suggested to be expert in medical domain knowledge to understand both natural languages and quantified medical data, and further make logical inferences.
Abstract:Due to the performance degradation of graph neural networks (GNNs) under distribution shifts, the work on out-of-distribution (OOD) generalization on graphs has received widespread attention. A novel perspective involves distinguishing potential confounding biases from different environments through environmental identification, enabling the model to escape environmentally-sensitive correlations and maintain stable performance under distribution shifts. However, in graph data, confounding factors not only affect the generation process of node features but also influence the complex interaction between nodes. We observe that neglecting either aspect of them will lead to a decrease in performance. In this paper, we propose IENE, an OOD generalization method on graphs based on node-level environmental identification and extrapolation techniques. It strengthens the model's ability to extract invariance from two granularities simultaneously, leading to improved generalization. Specifically, to identify invariance in features, we utilize the disentangled information bottleneck framework to achieve mutual promotion between node-level environmental estimation and invariant feature learning. Furthermore, we extrapolate topological environments through graph augmentation techniques to identify structural invariance. We implement the conceptual method with specific algorithms and provide theoretical analysis and proofs for our approach. Extensive experimental evaluations on two synthetic and four real-world OOD datasets validate the superiority of IENE, which outperforms existing techniques and provides a flexible framework for enhancing the generalization of GNNs.
Abstract:Natural language is often the easiest and most convenient modality for humans to specify tasks for robots. However, learning to ground language to behavior typically requires impractical amounts of diverse, language-annotated demonstrations collected on each target robot. In this work, we aim to separate the problem of what to accomplish from how to accomplish it, as the former can benefit from substantial amounts of external observation-only data, and only the latter depends on a specific robot embodiment. To this end, we propose Video-Language Critic, a reward model that can be trained on readily available cross-embodiment data using contrastive learning and a temporal ranking objective, and use it to score behavior traces from a separate reinforcement learning actor. When trained on Open X-Embodiment data, our reward model enables 2x more sample-efficient policy training on Meta-World tasks than a sparse reward only, despite a significant domain gap. Using in-domain data but in a challenging task generalization setting on Meta-World, we further demonstrate more sample-efficient training than is possible with prior language-conditioned reward models that are either trained with binary classification, use static images, or do not leverage the temporal information present in video data.
Abstract:This paper presents a SYCL implementation of Multi-Layer Perceptrons (MLPs), which targets and is optimized for the Intel Data Center GPU Max 1550. To increase the performance, our implementation minimizes the slow global memory accesses by maximizing the data reuse within the general register file and the shared local memory by fusing the operations in each layer of the MLP. We show with a simple roofline model that this results in a significant increase in the arithmetic intensity, leading to improved performance, especially for inference. We compare our approach to a similar CUDA implementation for MLPs and show that our implementation on the Intel Data Center GPU outperforms the CUDA implementation on Nvidia's H100 GPU by a factor up to 2.84 in inference and 1.75 in training. The paper also showcases the efficiency of our SYCL implementation in three significant areas: Image Compression, Neural Radiance Fields, and Physics-Informed Machine Learning. In all cases, our implementation outperforms the off-the-shelf Intel Extension for PyTorch (IPEX) implementation on the same Intel GPU by up to a factor of 30 and the CUDA PyTorch version on Nvidia's H100 GPU by up to a factor 19. The code can be found at https://github.com/intel/tiny-dpcpp-nn.
Abstract:Semantic communications (SemCom) have emerged as a new paradigm for supporting sixth-generation applications, where semantic features of data are transmitted using artificial intelligence algorithms to attain high communication efficiencies. Most existing SemCom techniques utilize deep neural networks (DNNs) to implement analog source-channel mappings, which are incompatible with existing digital communication architectures. To address this issue, this paper proposes a novel framework of digital deep joint source-channel coding (D$^2$-JSCC) targeting image transmission in SemCom. The framework features digital source and channel codings that are jointly optimized to reduce the end-to-end (E2E) distortion. First, deep source coding with an adaptive density model is designed to encode semantic features according to their distributions. Second, digital channel coding is employed to protect encoded features against channel distortion. To facilitate their joint design, the E2E distortion is characterized as a function of the source and channel rates via the analysis of the Bayesian model and Lipschitz assumption on the DNNs. Then to minimize the E2E distortion, a two-step algorithm is proposed to control the source-channel rates for a given channel signal-to-noise ratio. Simulation results reveal that the proposed framework outperforms classic deep JSCC and mitigates the cliff and leveling-off effects, which commonly exist for separation-based approaches.
Abstract:Humans can produce complex whole-body motions when interacting with their surroundings, by planning, executing and combining individual limb movements. We investigated this fundamental aspect of motor control in the setting of autonomous robotic operations. We approach this problem by hierarchical generative modelling equipped with multi-level planning-for autonomous task completion-that mimics the deep temporal architecture of human motor control. Here, temporal depth refers to the nested time scales at which successive levels of a forward or generative model unfold, for example, delivering an object requires a global plan to contextualise the fast coordination of multiple local movements of limbs. This separation of temporal scales also motivates robotics and control. Specifically, to achieve versatile sensorimotor control, it is advantageous to hierarchically structure the planning and low-level motor control of individual limbs. We use numerical and physical simulation to conduct experiments and to establish the efficacy of this formulation. Using a hierarchical generative model, we show how a humanoid robot can autonomously complete a complex task that necessitates a holistic use of locomotion, manipulation, and grasping. Specifically, we demonstrate the ability of a humanoid robot that can retrieve and transport a box, open and walk through a door to reach the destination, approach and kick a football, while showing robust performance in presence of body damage and ground irregularities. Our findings demonstrated the effectiveness of using human-inspired motor control algorithms, and our method provides a viable hierarchical architecture for the autonomous completion of challenging goal-directed tasks.
Abstract:Implementing fully automatic unmanned surface vehicles (USVs) monitoring water quality is challenging since effectively collecting environmental data while keeping the platform stable and environmental-friendly is hard to approach. To address this problem, we construct a USV that can automatically navigate an efficient path to sample water quality parameters in order to monitor the aquatic environment. The detection device needs to be stable enough to resist a hostile environment or climates while enormous volumes will disturb the aquaculture environment. Meanwhile, planning an efficient path for information collecting needs to deal with the contradiction between the restriction of energy and the amount of information in the coverage region. To tackle with mentioned challenges, we provide a USV platform that can perfectly balance mobility, stability, and portability attributed to its special round-shape structure and redundancy motion design. For informative planning, we combined the TSP and CPP algorithms to construct an optimistic plan for collecting more data within a certain range and limiting energy restrictions.We designed a fish existence prediction scenario to verify the novel system in both simulation experiments and field experiments. The novel aquaculture environment monitoring system significantly reduces the burden of manual operation in the fishery inspection field. Additionally, the simplicity of the sensor setup and the minimal cost of the platform enables its other possible applications in aquatic exploration and commercial utilization.
Abstract:This paper presents a multi-contact motion adaptation framework that enables teleoperation of high degree-of-freedom (DoF) robots, such as quadrupeds and humanoids, for loco-manipulation tasks in multi-contact settings. Our proposed algorithms optimize whole-body configurations and formulate the retargeting of multi-contact motions as sequential quadratic programming, which is robust and stable near the edges of feasibility constraints. Our framework allows real-time operation of the robot and reduces cognitive load for the operator because infeasible commands are automatically adapted into physically stable and viable motions on the robot. The results in simulations with full dynamics demonstrated the effectiveness of teleoperating different legged robots interactively and generating rich multi-contact movements. We evaluated the computational efficiency of the proposed algorithms, and further validated and analyzed multi-contact loco-manipulation tasks on humanoid and quadruped robots by reaching, active pushing and various traversal on uneven terrains.
Abstract:Legged robots have achieved remarkable performance in blind walking using either model-based control or data-driven deep reinforcement learning. To proactively navigate and traverse various terrains, active use of visual perception becomes indispensable, and this work aims to exploit the use of sparse visual observations to achieve perceptual locomotion over a range of commonly seen bumps, ramps, and stairs in human-centred environments. We first formulate the selection of minimal visual input that can represent the uneven surfaces of interest, and propose a learning framework that integrates such exteroceptive and proprioceptive data. We specifically select state observations and design a training curriculum to learn feedback control policies more effectively over a range of different terrains. Using an extensive benchmark, we validate the learned policy in tasks that require omnidirectional walking over flat ground and forward locomotion over terrains with obstacles, showing a high success rate of traversal. Particularly, the robot performs autonomous perceptual locomotion with minimal visual perception using depth measurements, which are easily available from a Lidar or RGB-D sensor, and successfully demonstrates robust ascent and descent over high stairs of 20 cm step height, i.e., 50% of its leg length.