Abstract:Time-varying data with irregular structures can be described by finite time-vertex graph signals (FTVGS), which represent potential temporal and spatial relationships among multiple sources. While sampling and corresponding reconstruction of FTVGS with known spectral support are well investigated, methods for the case of unknown spectral support remain underdeveloped. Existing random sampling schemes may acquire samples from any vertex at any time, which is uncommon in practical applications where sampling typically involves only a subset of vertices and time instants. In sight of this requirement, this paper proposes a subset random sampling scheme for FTVGS. We first randomly select some rows and columns of the FTVGS to form a submatrix, and then randomly sample within the submatrix. Theoretically, we prove sufficient conditions to ensure that the original FTVGS is reconstructed with high probability. Also, we validate the feasibility of reconstructing the original FTVGS by experiments.
Abstract:As AI closely interacts with human society, it is crucial to ensure that its decision-making is safe, altruistic, and aligned with human ethical and moral values. However, existing research on embedding ethical and moral considerations into AI remains insufficient, and previous external constraints based on principles and rules are inadequate to provide AI with long-term stability and generalization capabilities. In contrast, the intrinsic altruistic motivation based on empathy is more willing, spontaneous, and robust. Therefore, this paper is dedicated to autonomously driving intelligent agents to acquire morally behaviors through human-like affective empathy mechanisms. We draw inspiration from the neural mechanism of human brain's moral intuitive decision-making, and simulate the mirror neuron system to construct a brain-inspired affective empathy-driven altruistic decision-making model. Here, empathy directly impacts dopamine release to form intrinsic altruistic motivation. Based on the principle of moral utilitarianism, we design the moral reward function that integrates intrinsic empathy and extrinsic self-task goals. A comprehensive experimental scenario incorporating empathetic processes, personal objectives, and altruistic goals is developed. The proposed model enables the agent to make consistent moral decisions (prioritizing altruism) by balancing self-interest with the well-being of others. We further introduce inhibitory neurons to regulate different levels of empathy and verify the positive correlation between empathy levels and altruistic preferences, yielding conclusions consistent with findings from psychological behavioral experiments. This work provides a feasible solution for the development of ethical AI by leveraging the intrinsic human-like empathy mechanisms, and contributes to the harmonious coexistence between humans and AI.
Abstract:Utilizing Self-Supervised Learning (SSL) models for Speech Emotion Recognition (SER) has proven effective, yet limited research has explored cross-lingual scenarios. This study presents a comparative analysis between human performance and SSL models, beginning with a layer-wise analysis and an exploration of parameter-efficient fine-tuning strategies in monolingual, cross-lingual, and transfer learning contexts. We further compare the SER ability of models and humans at both utterance- and segment-levels. Additionally, we investigate the impact of dialect on cross-lingual SER through human evaluation. Our findings reveal that models, with appropriate knowledge transfer, can adapt to the target language and achieve performance comparable to native speakers. We also demonstrate the significant effect of dialect on SER for individuals without prior linguistic and paralinguistic background. Moreover, both humans and models exhibit distinct behaviors across different emotions. These results offer new insights into the cross-lingual SER capabilities of SSL models, underscoring both their similarities to and differences from human emotion perception.
Abstract:To accelerate the training of graph convolutional networks (GCNs) on real-world large-scale sparse graphs, downsampling methods are commonly employed as a preprocessing step. However, the effects of graph sparsity and topological structure on the transferability of downsampling methods have not been rigorously analyzed or theoretically guaranteed, particularly when the topological structure is affected by graph sparsity. In this paper, we introduce a novel downsampling method based on a sparse random graph model and derive an expected upper bound for the transfer error. Our findings show that smaller original graph sizes, higher expected average degrees, and increased sampling rates contribute to reducing this upper bound. Experimental results validate the theoretical predictions. By incorporating both sparsity and topological similarity into the model, this study establishes an upper bound on the transfer error for downsampling in the training of large-scale sparse graphs and provides insight into the influence of topological structure on transfer performance.
Abstract:In this paper, we propose a large-scale sparse graph downsampling method based on a sparse random graph model, which allows for the adjustment of different sparsity levels. We combine sparsity and topological similarity: the sparse graph model reduces the node connection probability as the graph size increases, while the downsampling method preserves a specific topological connection pattern during this change. Based on the downsampling method, we derive a theoretical transferability bound about downsampling sparse graph convolutional networks (GCNs), that higher sampling rates, greater average degree expectations, and smaller initial graph sizes lead to better downsampling transferability performance.
Abstract:The question "Can machines think?" and the Turing Test to assess whether machines could achieve human-level intelligence is one of the roots of AI. With the philosophical argument "I think, therefore I am", this paper challenge the idea of a "thinking machine" supported by current AIs since there is no sense of self in them. Current artificial intelligence is only seemingly intelligent information processing and does not truly understand or be subjectively aware of oneself and perceive the world with the self as human intelligence does. In this paper, we introduce a Brain-inspired and Self-based Artificial Intelligence (BriSe AI) paradigm. This BriSe AI paradigm is dedicated to coordinating various cognitive functions and learning strategies in a self-organized manner to build human-level AI models and robotic applications. Specifically, BriSe AI emphasizes the crucial role of the Self in shaping the future AI, rooted with a practical hierarchical Self framework, including Perception and Learning, Bodily Self, Autonomous Self, Social Self, and Conceptual Self. The hierarchical framework of the Self highlights self-based environment perception, self-bodily modeling, autonomous interaction with the environment, social interaction and collaboration with others, and even more abstract understanding of the Self. Furthermore, the positive mutual promotion and support among multiple levels of Self, as well as between Self and learning, enhance the BriSe AI's conscious understanding of information and flexible adaptation to complex environments, serving as a driving force propelling BriSe AI towards real Artificial General Intelligence.
Abstract:The theory of sampling and recovery of bandlimited graph signals has been extensively studied. However, in many cases, the observation of a signal is quite coarse. For example, users only provide simple comments such as "like" or "dislike" for a product on an e-commerce platform. This is a particular scenario where only the sign information of a graph signal can be measured. In this paper, we are interested in how to sample based on sign information in an online manner, by which the direction of the original graph signal can be estimated. The online signed sampling problem of a graph signal can be formulated as a Markov decision process in a finite horizon. Unfortunately, it is intractable for large size graphs. We propose a low-complexity greedy signed sampling algorithm (GSS) as well as a stopping criterion. Meanwhile, we prove that the objective function is adaptive monotonic and adaptive submodular, so that the performance is close enough to the global optimum with a lower bound. Finally, we demonstrate the effectiveness of the GSS algorithm by both synthesis and realworld data.
Abstract:Object tracking based on the fusion of visible and thermal im-ages, known as RGB-T tracking, has gained increasing atten-tion from researchers in recent years. How to achieve a more comprehensive fusion of information from the two modalities with fewer computational costs has been a problem that re-searchers have been exploring. Recently, with the rise of prompt learning in computer vision, we can better transfer knowledge from visual large models to downstream tasks. Considering the strong complementarity between visible and thermal modalities, we propose a tracking architecture based on mutual prompt learning between the two modalities. We also design a lightweight prompter that incorporates attention mechanisms in two dimensions to transfer information from one modality to the other with lower computational costs, embedding it into each layer of the backbone. Extensive ex-periments have demonstrated that our proposed tracking ar-chitecture is effective and efficient, achieving state-of-the-art performance while maintaining high running speeds.
Abstract:Few studies have worked on the effects of tonal coarticulation and prosodic positions on the low rising tone in Xiamen Dialect. This study addressed such an issue. To do so, a new method, the Tonal Contour Analysis in Tonal Triangle, was proposed to measure the subtle curvature of the tonal contour. Findings are as follows: (1) The low rising tone in Xiamen Dialect has a tendency towards the falling-rising tone, which is significantly affected by the tonal coarticulation and prosodic positions. (2) The low rising tone presents as a falling-rising tone when preceded by a tone with a high offset, and as a low rising tone when preceded by a tone that ends up low. (3) The curvature of the low rising tone is greatest in the sentence-initial position, and is positively correlated to its own duration.
Abstract:When sampling multiple signals, the correlation between the signals can be exploited to reduce the overall number of samples. In this paper, we study the sampling theory of multiple correlated signals, using correlation to sample them at the lowest sampling rate. Based on the correlation between signal sources, we model multiple continuous-time signals as continuous time-vertex graph signals. The graph signals are projected onto orthogonal bases to remove spatial correlation and reduce dimensions by graph Fourier transform. When the bandwidths of the original signals and the reduced dimension signals are given, we prove the minimum sampling rate required for recovery of the original signals, and propose a feasible sampling scheme.