Abstract:Pre-trained vision-language models (VLMs) have showcased remarkable performance in image and natural language understanding, such as image captioning and response generation. As the practical applications of vision-language models become increasingly widespread, their potential safety and robustness issues raise concerns that adversaries may evade the system and cause these models to generate toxic content through malicious attacks. Therefore, evaluating the robustness of open-source VLMs against adversarial attacks has garnered growing attention, with transfer-based attacks as a representative black-box attacking strategy. However, most existing transfer-based attacks neglect the importance of the semantic correlations between vision and text modalities, leading to sub-optimal adversarial example generation and attack performance. To address this issue, we present Chain of Attack (CoA), which iteratively enhances the generation of adversarial examples based on the multi-modal semantic update using a series of intermediate attacking steps, achieving superior adversarial transferability and efficiency. A unified attack success rate computing method is further proposed for automatic evasion evaluation. Extensive experiments conducted under the most realistic and high-stakes scenario, demonstrate that our attacking strategy can effectively mislead models to generate targeted responses using only black-box attacks without any knowledge of the victim models. The comprehensive robustness evaluation in our paper provides insight into the vulnerabilities of VLMs and offers a reference for the safety considerations of future model developments.
Abstract:Accurate prediction of metro Origin-Destination (OD) flow is essential for the development of intelligent transportation systems and effective urban traffic management. Existing approaches typically either predict passenger outflow of departure stations or inflow of destination stations. However, we argue that travelers generally have clearly defined departure and arrival stations, making these OD pairs inherently interconnected. Consequently, considering OD pairs as a unified entity more accurately reflects actual metro travel patterns and allows for analyzing potential spatio-temporal correlations between different OD pairs. To address these challenges, we propose a novel and effective urban metro OD flow prediction method (UMOD), comprising three core modules: a data embedding module, a temporal relation module, and a spatial relation module. The data embedding module projects raw OD pair inputs into hidden space representations, which are subsequently processed by the temporal and spatial relation modules to capture both inter-pair and intra-pair spatio-temporal dependencies. Experimental results on two real-world urban metro OD flow datasets demonstrate that adopting the OD pairs perspective is critical for accurate metro OD flow prediction. Our method outperforms existing approaches, delivering superior predictive performance.
Abstract:Recently Whisper has approached human-level robustness and accuracy in English automatic speech recognition (ASR), while in minor language and mixed language speech recognition, there remains a compelling need for further improvement. In this work, we present the impressive results of Whisper-MCE, our finetuned Whisper model, which was trained using our self-collected dataset, Mixed Cantonese and English audio dataset (MCE). Meanwhile, considering word error rate (WER) poses challenges when it comes to evaluating its effectiveness in minor language and mixed-language contexts, we present a novel rating mechanism. By comparing our model to the baseline whisper-large-v2 model, we demonstrate its superior ability to accurately capture the content of the original audio, achieve higher recognition accuracy, and exhibit faster recognition speed. Notably, our model outperforms other existing models in the specific task of recognizing mixed language.
Abstract:Urban metro flow prediction is of great value for metro operation scheduling, passenger flow management and personal travel planning. However, it faces two main challenges. First, different metro stations, e.g. transfer stations and non-transfer stations, have unique traffic patterns. Second, it is challenging to model complex spatio-temporal dynamic relation of metro stations. To address these challenges, we develop a spatio-temporal dynamic graph relational learning model (STDGRL) to predict urban metro station flow. First, we propose a spatio-temporal node embedding representation module to capture the traffic patterns of different stations. Second, we employ a dynamic graph relationship learning module to learn dynamic spatial relationships between metro stations without a predefined graph adjacency matrix. Finally, we provide a transformer-based long-term relationship prediction module for long-term metro flow prediction. Extensive experiments are conducted based on metro data in Beijing, Shanghai, Chongqing and Hangzhou. Experimental results show the advantages of our method beyond 11 baselines for urban metro flow prediction.
Abstract:Weather Forecasting is an attractive challengeable task due to its influence on human life and complexity in atmospheric motion. Supported by massive historical observed time series data, the task is suitable for data-driven approaches, especially deep neural networks. Recently, the Graph Neural Networks (GNNs) based methods have achieved excellent performance for spatio-temporal forecasting. However, the canonical GNNs-based methods only individually model the local graph of meteorological variables per station or the global graph of whole stations, lacking information interaction between meteorological variables in different stations. In this paper, we propose a novel Hierarchical Spatio-Temporal Graph Neural Network (HiSTGNN) to model cross-regional spatio-temporal correlations among meteorological variables in multiple stations. An adaptive graph learning layer and spatial graph convolution are employed to construct self-learning graph and study hidden dependency among nodes of variable-level and station-level graph. For capturing temporal pattern, the dilated inception as the backbone of gate temporal convolution is designed to model long and various meteorological trends. Moreover, a dynamic interaction learning is proposed to build bidirectional information passing in hierarchical graph. Experimental results on three real-world meteorological datasets demonstrate the superior performance of HiSTGNN beyond 7 baselines and it reduces the errors by 4.2% to 11.6% especially compared to state-of-the-art weather forecasting method.
Abstract:Multi-dimensional data exploration is a classic research topic in visualization. Most existing approaches are designed for identifying record patterns in dimensional space or subspace. In this paper, we propose a visual analytics approach to exploring subset patterns. The core of the approach is a subset embedding network (SEN) that represents a group of subsets as uniformly-formatted embeddings. We implement the SEN as multiple subnets with separate loss functions. The design enables to handle arbitrary subsets and capture the similarity of subsets on single features, thus achieving accurate pattern exploration, which in most cases is searching for subsets having similar values on few features. Moreover, each subnet is a fully-connected neural network with one hidden layer. The simple structure brings high training efficiency. We integrate the SEN into a visualization system that achieves a 3-step workflow. Specifically, analysts (1) partition the given dataset into subsets, (2) select portions in a projected latent space created using the SEN, and (3) determine the existence of patterns within selected subsets. Generally, the system combines visualizations, interactions, automatic methods, and quantitative measures to balance the exploration flexibility and operation efficiency, and improve the interpretability and faithfulness of the identified patterns. Case studies and quantitative experiments on multiple open datasets demonstrate the general applicability and effectiveness of our approach.
Abstract:Neuron reconstruction is essential to generate exquisite neuron connectivity map for understanding brain function. Despite the significant amount of effect that has been made on automatic reconstruction methods, manual tracing by well-trained human annotators is still necessary. To ensure the quality of reconstructed neurons and provide guidance for annotators to improve their efficiency, we propose a deep learning based quality control method for neuron reconstruction in this paper. By formulating the quality control problem into a binary classification task regarding each single point, the proposed approach overcomes the technical difficulties resulting from the large image size and complex neuron morphology. Not only it provides the evaluation of reconstruction quality, but also can locate exactly where the wrong tracing begins. This work presents one of the first comprehensive studies for whole-brain scale quality control of neuron reconstructions. Experiments on five-fold cross validation with a large dataset demonstrate that the proposed approach can detect 74.7% errors with only 1.4% false alerts.
Abstract:Urban spatial-temporal flows prediction is of great importance to traffic management, land use, public safety, etc. Urban flows are affected by several complex and dynamic factors, such as patterns of human activities, weather, events and holidays. Datasets evaluated the flows come from various sources in different domains, e.g. mobile phone data, taxi trajectories data, metro/bus swiping data, bike-sharing data and so on. To summarize these methodologies of urban flows prediction, in this paper, we first introduce four main factors affecting urban flows. Second, in order to further analysis urban flows, a preparation process of multi-sources spatial-temporal data related with urban flows is partitioned into three groups. Third, we choose the spatial-temporal dynamic data as a case study for the urban flows prediction task. Fourth, we analyze and compare some well-known and state-of-the-art flows prediction methods in detail, classifying them into five categories: statistics-based, traditional machine learning-based, deep learning-based, reinforcement learning-based and transfer learning-based methods. Finally, we give open challenges of urban flows prediction and an outlook in the future of this field. This paper will facilitate researchers find suitable methods and open datasets for addressing urban spatial-temporal flows forecast problems.