Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xingyu Gao

MSDformer: Multi-scale Discrete Transformer For Time Series Generation

May 20, 2025

Zhicheng Chen, Shibo Feng, Xi Xiao, Zhong Zhang, Qing Li, Xingyu Gao, Peilin Zhao

Abstract:Discrete Token Modeling (DTM), which employs vector quantization techniques, has demonstrated remarkable success in modeling non-natural language modalities, particularly in time series generation. While our prior work SDformer established the first DTM-based framework to achieve state-of-the-art performance in this domain, two critical limitations persist in existing DTM approaches: 1) their inability to capture multi-scale temporal patterns inherent to complex time series data, and 2) the absence of theoretical foundations to guide model optimization. To address these challenges, we proposes a novel multi-scale DTM-based time series generation method, called Multi-Scale Discrete Transformer (MSDformer). MSDformer employs a multi-scale time series tokenizer to learn discrete token representations at multiple scales, which jointly characterize the complex nature of time series data. Subsequently, MSDformer applies a multi-scale autoregressive token modeling technique to capture the multi-scale patterns of time series within the discrete latent space. Theoretically, we validate the effectiveness of the DTM method and the rationality of MSDformer through the rate-distortion theorem. Comprehensive experiments demonstrate that MSDformer significantly outperforms state-of-the-art methods. Both theoretical analysis and experimental results demonstrate that incorporating multi-scale information and modeling multi-scale patterns can substantially enhance the quality of generated time series in DTM-based approaches. The code will be released upon acceptance.

Via

Access Paper or Ask Questions

TS-LIF: A Temporal Segment Spiking Neuron Network for Time Series Forecasting

Mar 07, 2025

Shibo Feng, Wanjin Feng, Xingyu Gao, Peilin Zhao, Zhiqi Shen

Abstract:Spiking Neural Networks (SNNs) offer a promising, biologically inspired approach for processing spatiotemporal data, particularly for time series forecasting. However, conventional neuron models like the Leaky Integrate-and-Fire (LIF) struggle to capture long-term dependencies and effectively process multi-scale temporal dynamics. To overcome these limitations, we introduce the Temporal Segment Leaky Integrate-and-Fire (TS-LIF) model, featuring a novel dual-compartment architecture. The dendritic and somatic compartments specialize in capturing distinct frequency components, providing functional heterogeneity that enhances the neuron's ability to process both low- and high-frequency information. Furthermore, the newly introduced direct somatic current injection reduces information loss during intra-neuronal transmission, while dendritic spike generation improves multi-scale information extraction. We provide a theoretical stability analysis of the TS-LIF model and explain how each compartment contributes to distinct frequency response characteristics. Experimental results show that TS-LIF outperforms traditional SNNs in time series forecasting, demonstrating better accuracy and robustness, even with missing data. TS-LIF advances the application of SNNs in time-series forecasting, providing a biologically inspired approach that captures complex temporal dynamics and offers potential for practical implementation in diverse forecasting scenarios. The source code is available at https://github.com/kkking-kk/TS-LIF.

Via

Access Paper or Ask Questions

Unveiling Structural Memorization: Structural Membership Inference Attack for Text-to-Image Diffusion Models

Jul 18, 2024

Qiao Li, Xiaomeng Fu, Xi Wang, Jin Liu, Xingyu Gao, Jiao Dai, Jizhong Han

Abstract:With the rapid advancements of large-scale text-to-image diffusion models, various practical applications have emerged, bringing significant convenience to society. However, model developers may misuse the unauthorized data to train diffusion models. These data are at risk of being memorized by the models, thus potentially violating citizens' privacy rights. Therefore, in order to judge whether a specific image is utilized as a member of a model's training set, Membership Inference Attack (MIA) is proposed to serve as a tool for privacy protection. Current MIA methods predominantly utilize pixel-wise comparisons as distinguishing clues, considering the pixel-level memorization characteristic of diffusion models. However, it is practically impossible for text-to-image models to memorize all the pixel-level information in massive training sets. Therefore, we move to the more advanced structure-level memorization. Observations on the diffusion process show that the structures of members are better preserved compared to those of nonmembers, indicating that diffusion models possess the capability to remember the structures of member images from training sets. Drawing on these insights, we propose a simple yet effective MIA method tailored for text-to-image diffusion models. Extensive experimental results validate the efficacy of our approach. Compared to current pixel-level baselines, our approach not only achieves state-of-the-art performance but also demonstrates remarkable robustness against various distortions.

Via

Access Paper or Ask Questions

Point cloud-based registration and image fusion between cardiac SPECT MPI and CTA

Feb 10, 2024

Shaojie Tang, Penpen Miao, Xingyu Gao, Yu Zhong, Dantong Zhu, Haixing Wen, Zhihui Xu, Qiuyue Wei, Hongping Yao, Xin Huang(+3 more)

Abstract:A method was proposed for the point cloud-based registration and image fusion between cardiac single photon emission computed tomography (SPECT) myocardial perfusion images (MPI) and cardiac computed tomography angiograms (CTA). Firstly, the left ventricle (LV) epicardial regions (LVERs) in SPECT and CTA images were segmented by using different U-Net neural networks trained to generate the point clouds of the LV epicardial contours (LVECs). Secondly, according to the characteristics of cardiac anatomy, the special points of anterior and posterior interventricular grooves (APIGs) were manually marked in both SPECT and CTA image volumes. Thirdly, we developed an in-house program for coarsely registering the special points of APIGs to ensure a correct cardiac orientation alignment between SPECT and CTA images. Fourthly, we employed ICP, SICP or CPD algorithm to achieve a fine registration for the point clouds (together with the special points of APIGs) of the LV epicardial surfaces (LVERs) in SPECT and CTA images. Finally, the image fusion between SPECT and CTA was realized after the fine registration. The experimental results showed that the cardiac orientation was aligned well and the mean distance error of the optimal registration method (CPD with affine transform) was consistently less than 3 mm. The proposed method could effectively fuse the structures from cardiac CTA and SPECT functional images, and demonstrated a potential in assisting in accurate diagnosis of cardiac diseases by combining complementary advantages of the two imaging modalities.

Via

Access Paper or Ask Questions

Contrastive Multi-Level Graph Neural Networks for Session-based Recommendation

Nov 06, 2023

Fuyun Wang, Xingyu Gao, Zhenyu Chen, Lei Lyu

Abstract:Session-based recommendation (SBR) aims to predict the next item at a certain time point based on anonymous user behavior sequences. Existing methods typically model session representation based on simple item transition information. However, since session-based data consists of limited users' short-term interactions, modeling session representation by capturing fixed item transition information from a single dimension suffers from data sparsity. In this paper, we propose a novel contrastive multi-level graph neural networks (CM-GNN) to better exploit complex and high-order item transition information. Specifically, CM-GNN applies local-level graph convolutional network (L-GCN) and global-level network (G-GCN) on the current session and all the sessions respectively, to effectively capture pairwise relations over all the sessions by aggregation strategy. Meanwhile, CM-GNN applies hyper-level graph convolutional network (H-GCN) to capture high-order information among all the item transitions. CM-GNN further introduces an attention-based fusion module to learn pairwise relation-based session representation by fusing the item representations generated by L-GCN and G-GCN. CM-GNN averages the item representations obtained by H-GCN to obtain high-order relation-based session representation. Moreover, to convert the high-order item transition information into the pairwise relation-based session representation, CM-GNN maximizes the mutual information between the representations derived from the fusion module and the average pool layer by contrastive learning paradigm. We conduct extensive experiments on multiple widely used benchmark datasets to validate the efficacy of the proposed method. The encouraging results demonstrate that our proposed method outperforms the state-of-the-art SBR techniques.

* IEEE Transactions on Multimedia, February 2023

Via

Access Paper or Ask Questions

Identification of Causal Relationship between Amyloid-beta Accumulation and Alzheimer's Disease Progression via Counterfactual Inference

Jul 03, 2023

Haixing Dai, Mengxuan Hu, Qing Li, Lu Zhang, Lin Zhao, Dajiang Zhu, Ibai Diez, Jorge Sepulcre, Fan Zhang, Xingyu Gao(+5 more)

Figure 1 for Identification of Causal Relationship between Amyloid-beta Accumulation and Alzheimer's Disease Progression via Counterfactual Inference

Figure 2 for Identification of Causal Relationship between Amyloid-beta Accumulation and Alzheimer's Disease Progression via Counterfactual Inference

Figure 3 for Identification of Causal Relationship between Amyloid-beta Accumulation and Alzheimer's Disease Progression via Counterfactual Inference

Figure 4 for Identification of Causal Relationship between Amyloid-beta Accumulation and Alzheimer's Disease Progression via Counterfactual Inference

Abstract:Alzheimer's disease (AD) is a neurodegenerative disorder that is beginning with amyloidosis, followed by neuronal loss and deterioration in structure, function, and cognition. The accumulation of amyloid-beta in the brain, measured through 18F-florbetapir (AV45) positron emission tomography (PET) imaging, has been widely used for early diagnosis of AD. However, the relationship between amyloid-beta accumulation and AD pathophysiology remains unclear, and causal inference approaches are needed to uncover how amyloid-beta levels can impact AD development. In this paper, we propose a graph varying coefficient neural network (GVCNet) for estimating the individual treatment effect with continuous treatment levels using a graph convolutional neural network. We highlight the potential of causal inference approaches, including GVCNet, for measuring the regional causal connections between amyloid-beta accumulation and AD pathophysiology, which may serve as a robust tool for early diagnosis and tailored care.

Via

Access Paper or Ask Questions

Parameterization of Cross-Token Relations with Relative Positional Encoding for Vision MLP

Jul 15, 2022

Zhicai Wang, Yanbin Hao, Xingyu Gao, Hao Zhang, Shuo Wang, Tingting Mu, Xiangnan He

Figure 1 for Parameterization of Cross-Token Relations with Relative Positional Encoding for Vision MLP

Figure 2 for Parameterization of Cross-Token Relations with Relative Positional Encoding for Vision MLP

Figure 3 for Parameterization of Cross-Token Relations with Relative Positional Encoding for Vision MLP

Figure 4 for Parameterization of Cross-Token Relations with Relative Positional Encoding for Vision MLP

Abstract:Vision multi-layer perceptrons (MLPs) have shown promising performance in computer vision tasks, and become the main competitor of CNNs and vision Transformers. They use token-mixing layers to capture cross-token interactions, as opposed to the multi-head self-attention mechanism used by Transformers. However, the heavily parameterized token-mixing layers naturally lack mechanisms to capture local information and multi-granular non-local relations, thus their discriminative power is restrained. To tackle this issue, we propose a new positional spacial gating unit (PoSGU). It exploits the attention formulations used in the classical relative positional encoding (RPE), to efficiently encode the cross-token relations for token mixing. It can successfully reduce the current quadratic parameter complexity $O(N^2)$ of vision MLPs to $O(N)$ and $O(1)$. We experiment with two RPE mechanisms, and further propose a group-wise extension to improve their expressive power with the accomplishment of multi-granular contexts. These then serve as the key building blocks of a new type of vision MLP, referred to as PosMLP. We evaluate the effectiveness of the proposed approach by conducting thorough experiments, demonstrating an improved or comparable performance with reduced parameter complexity. For instance, for a model trained on ImageNet1K, we achieve a performance improvement from 72.14\% to 74.02\% and a learnable parameter reduction from $19.4M$ to $18.2M$. Code could be found at \href{https://github.com/Zhicaiwww/PosMLP}{https://github.com/Zhicaiwww/PosMLP}.

Via

Access Paper or Ask Questions

GTac: A Biomimetic Tactile Sensor with Skin-like Heterogeneous Force Feedback for Robots

Jan 28, 2022

Zeyu Lu, Xingyu Gao, Haoyong Yu

Figure 1 for GTac: A Biomimetic Tactile Sensor with Skin-like Heterogeneous Force Feedback for Robots

Figure 2 for GTac: A Biomimetic Tactile Sensor with Skin-like Heterogeneous Force Feedback for Robots

Figure 3 for GTac: A Biomimetic Tactile Sensor with Skin-like Heterogeneous Force Feedback for Robots

Figure 4 for GTac: A Biomimetic Tactile Sensor with Skin-like Heterogeneous Force Feedback for Robots

Abstract:The tactile sensing capabilities of human hands are essential in performing daily activities. Simultaneously perceiving normal and shear forces via the mechanoreceptors integrated into the hands enables humans to achieve daily tasks like grasping delicate objects. In this paper, we design and fabricate a novel biomimetic tactile sensor with skin-like heterogeneity that perceives normal and shear contact forces simultaneously. It mimics the multilayers of mechanoreceptors by combining an extrinsic layer (piezoresistive sensors) and an intrinsic layer (a Hall sensor) so that it can perform estimation of contact force directions, locations, and joint-level torque. By integrating our sensors, a robotic gripper can obtain contact force feedback at fingertips; accordingly, robots can perform challenging tasks, such as tweezers usage, and egg grasping. This insightful sensor design can be customized and applied in different areas of robots and provide them with heterogeneous force sensing, potentially supporting robotics in acquiring skin-like tactile feedback.

Via

Access Paper or Ask Questions

On the Effectiveness of Sampled Softmax Loss for Item Recommendation

Jan 07, 2022

Jiancan Wu, Xiang Wang, Xingyu Gao, Jiawei Chen, Hongcheng Fu, Tianyu Qiu, Xiangnan He

Figure 1 for On the Effectiveness of Sampled Softmax Loss for Item Recommendation

Figure 2 for On the Effectiveness of Sampled Softmax Loss for Item Recommendation

Figure 3 for On the Effectiveness of Sampled Softmax Loss for Item Recommendation

Figure 4 for On the Effectiveness of Sampled Softmax Loss for Item Recommendation

Abstract:Learning objectives of recommender models remain largely unexplored. Most methods routinely adopt either pointwise or pairwise loss to train the model parameters, while rarely pay attention to softmax loss due to the high computational cost. Sampled softmax loss emerges as an efficient substitute for softmax loss. Its special case, InfoNCE loss, has been widely used in self-supervised learning and exhibited remarkable performance for contrastive learning. Nonetheless, limited studies use sampled softmax loss as the learning objective to train the recommender. Worse still, none of them explore its properties and answer "Does sampled softmax loss suit for item recommendation?" and "What are the conceptual advantages of sampled softmax loss, as compared with the prevalent losses?", to the best of our knowledge. In this work, we aim to better understand sampled softmax loss for item recommendation. Specifically, we first theoretically reveal three model-agnostic advantages: (1) mitigating popularity bias, which is beneficial to long-tail recommendation; (2) mining hard negative samples, which offers informative gradients to optimize model parameters; and (3) maximizing the ranking metric, which facilitates top-K performance. Moreover, we probe the model-specific characteristics on the top of various recommenders. Experimental results suggest that sampled softmax loss is more friendly to history and graph-based recommenders (e.g., SVD++ and LightGCN), but performs poorly for ID-based models (e.g., MF). We ascribe this to its shortcoming in learning representation magnitude, making the combination with the models that are also incapable of adjusting representation magnitude learn poor representations. In contrast, the history- and graph-based models, which naturally adjust representation magnitude according to node degree, are able to compensate for the shortcoming of sampled softmax loss.

* 10 Pages, 1 figure, 5 tables

Via

Access Paper or Ask Questions

Parsing-based View-aware Embedding Network for Vehicle Re-Identification

Apr 10, 2020

Dechao Meng, Liang Li, Xuejing Liu, Yadong Li, Shijie Yang, Zhengjun Zha, Xingyu Gao, Shuhui Wang, Qingming Huang

Figure 1 for Parsing-based View-aware Embedding Network for Vehicle Re-Identification

Figure 2 for Parsing-based View-aware Embedding Network for Vehicle Re-Identification

Figure 3 for Parsing-based View-aware Embedding Network for Vehicle Re-Identification

Figure 4 for Parsing-based View-aware Embedding Network for Vehicle Re-Identification

Abstract:Vehicle Re-Identification is to find images of the same vehicle from various views in the cross-camera scenario. The main challenges of this task are the large intra-instance distance caused by different views and the subtle inter-instance discrepancy caused by similar vehicles. In this paper, we propose a parsing-based view-aware embedding network (PVEN) to achieve the view-aware feature alignment and enhancement for vehicle ReID. First, we introduce a parsing network to parse a vehicle into four different views, and then align the features by mask average pooling. Such alignment provides a fine-grained representation of the vehicle. Second, in order to enhance the view-aware features, we design a common-visible attention to focus on the common visible views, which not only shortens the distance among intra-instances, but also enlarges the discrepancy of inter-instances. The PVEN helps capture the stable discriminative information of vehicle under different views. The experiments conducted on three datasets show that our model outperforms state-of-the-art methods by a large margin.

* 10 pages, 6 figures

Via

Access Paper or Ask Questions