Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Qian Sun

Shandong University of Science and Technology

Spiking Vision Transformer with Saccadic Attention

Feb 18, 2025

Shuai Wang, Malu Zhang, Dehao Zhang, Ammar Belatreche, Yichen Xiao, Yu Liang, Yimeng Shan, Qian Sun, Enqi Zhang, Yang Yang

Abstract:The combination of Spiking Neural Networks (SNNs) and Vision Transformers (ViTs) holds potential for achieving both energy efficiency and high performance, particularly suitable for edge vision applications. However, a significant performance gap still exists between SNN-based ViTs and their ANN counterparts. Here, we first analyze why SNN-based ViTs suffer from limited performance and identify a mismatch between the vanilla self-attention mechanism and spatio-temporal spike trains. This mismatch results in degraded spatial relevance and limited temporal interactions. To address these issues, we draw inspiration from biological saccadic attention mechanisms and introduce an innovative Saccadic Spike Self-Attention (SSSA) method. Specifically, in the spatial domain, SSSA employs a novel spike distribution-based method to effectively assess the relevance between Query and Key pairs in SNN-based ViTs. Temporally, SSSA employs a saccadic interaction module that dynamically focuses on selected visual areas at each timestep and significantly enhances whole scene understanding through temporal interactions. Building on the SSSA mechanism, we develop a SNN-based Vision Transformer (SNN-ViT). Extensive experiments across various visual tasks demonstrate that SNN-ViT achieves state-of-the-art performance with linear computational complexity. The effectiveness and efficiency of the SNN-ViT highlight its potential for power-critical edge vision applications.

* Published as a conference paper at ICLR 2025

Via

Access Paper or Ask Questions

On Active Privacy Auditing in Supervised Fine-tuning for White-Box Language Models

Nov 12, 2024

Qian Sun, Hanpeng Wu, Xi Sheryl Zhang

Figure 1 for On Active Privacy Auditing in Supervised Fine-tuning for White-Box Language Models

Figure 2 for On Active Privacy Auditing in Supervised Fine-tuning for White-Box Language Models

Figure 3 for On Active Privacy Auditing in Supervised Fine-tuning for White-Box Language Models

Figure 4 for On Active Privacy Auditing in Supervised Fine-tuning for White-Box Language Models

Abstract:The pretraining and fine-tuning approach has become the leading technique for various NLP applications. However, recent studies reveal that fine-tuning data, due to their sensitive nature, domain-specific characteristics, and identifiability, pose significant privacy concerns. To help develop more privacy-resilient fine-tuning models, we introduce a novel active privacy auditing framework, dubbed Parsing, designed to identify and quantify privacy leakage risks during the supervised fine-tuning (SFT) of language models (LMs). The framework leverages improved white-box membership inference attacks (MIAs) as the core technology, utilizing novel learning objectives and a two-stage pipeline to monitor the privacy of the LMs' fine-tuning process, maximizing the exposure of privacy risks. Additionally, we have improved the effectiveness of MIAs on large LMs including GPT-2, Llama2, and certain variants of them. Our research aims to provide the SFT community of LMs with a reliable, ready-to-use privacy auditing tool, and to offer valuable insights into safeguarding privacy during the fine-tuning process. Experimental results confirm the framework's efficiency across various models and tasks, emphasizing notable privacy concerns in the fine-tuning process. Project code available for https://anonymous.4open.science/r/PARSING-4817/.

Via

Access Paper or Ask Questions

Horizon-wise Learning Paradigm Promotes Gene Splicing Identification

Jun 15, 2024

Qi-Jie Li, Qian Sun, Shao-Qun Zhang

Abstract:Identifying gene splicing is a core and significant task confronted in modern collaboration between artificial intelligence and bioinformatics. Past decades have witnessed great efforts on this concern, such as the bio-plausible splicing pattern AT-CG and the famous SpliceAI. In this paper, we propose a novel framework for the task of gene splicing identification, named Horizon-wise Gene Splicing Identification (H-GSI). The proposed H-GSI follows the horizon-wise identification paradigm and comprises four components: the pre-processing procedure transforming string data into tensors, the sliding window technique handling long sequences, the SeqLab model, and the predictor. In contrast to existing studies that process gene information with a truncated fixed-length sequence, H-GSI employs a horizon-wise identification paradigm in which all positions in a sequence are predicted with only one forward computation, improving accuracy and efficiency. The experiments conducted on the real-world Human dataset show that our proposed H-GSI outperforms SpliceAI and achieves the best accuracy of 97.20\%. The source code is available from this link.

Via

Access Paper or Ask Questions

Semi-Weakly Supervised Object Kinematic Motion Prediction

Apr 03, 2023

Gengxin Liu, Qian Sun, Haibin Huang, Chongyang Ma, Yulan Guo, Li Yi, Hui Huang, Ruizhen Hu

Abstract:Given a 3D object, kinematic motion prediction aims to identify the mobile parts as well as the corresponding motion parameters. Due to the large variations in both topological structure and geometric details of 3D objects, this remains a challenging task and the lack of large scale labeled data also constrain the performance of deep learning based approaches. In this paper, we tackle the task of object kinematic motion prediction problem in a semi-weakly supervised manner. Our key observations are two-fold. First, although 3D dataset with fully annotated motion labels is limited, there are existing datasets and methods for object part semantic segmentation at large scale. Second, semantic part segmentation and mobile part segmentation is not always consistent but it is possible to detect the mobile parts from the underlying 3D structure. Towards this end, we propose a graph neural network to learn the map between hierarchical part-level segmentation and mobile parts parameters, which are further refined based on geometric alignment. This network can be first trained on PartNet-Mobility dataset with fully labeled mobility information and then applied on PartNet dataset with fine-grained and hierarchical part-level segmentation. The network predictions yield a large scale of 3D objects with pseudo labeled mobility information and can further be used for weakly-supervised learning with pre-existing segmentation. Our experiments show there are significant performance boosts with the augmented data for previous method designed for kinematic motion prediction on 3D partial scans.

* CVPR 2023

Via

Access Paper or Ask Questions

Search By Image: Deeply Exploring Beneficial Features for Beauty Product Retrieval

Mar 24, 2023

Mingqiang Wei, Qian Sun, Haoran Xie, Dong Liang, Fu Lee Wang

Abstract:Searching by image is popular yet still challenging due to the extensive interference arose from i) data variations (e.g., background, pose, visual angle, brightness) of real-world captured images and ii) similar images in the query dataset. This paper studies a practically meaningful problem of beauty product retrieval (BPR) by neural networks. We broadly extract different types of image features, and raise an intriguing question that whether these features are beneficial to i) suppress data variations of real-world captured images, and ii) distinguish one image from others which look very similar but are intrinsically different beauty products in the dataset, therefore leading to an enhanced capability of BPR. To answer it, we present a novel variable-attention neural network to understand the combination of multiple features (termed VM-Net) of beauty product images. Considering that there are few publicly released training datasets for BPR, we establish a new dataset with more than one million images classified into more than 20K categories to improve both the generalization and anti-interference abilities of VM-Net and other methods. We verify the performance of VM-Net and its competitors on the benchmark dataset Perfect-500K, where VM-Net shows clear improvements over the competitors in terms of MAP@7. The source code and dataset will be released upon publication.

Via

Access Paper or Ask Questions

OPE-SR: Orthogonal Position Encoding for Designing a Parameter-free Upsampling Module in Arbitrary-scale Image Super-Resolution

Mar 02, 2023

Gaochao Song, Luo Zhang, Ran Su, Jianfeng Shi, Ying He, Qian Sun

Figure 1 for OPE-SR: Orthogonal Position Encoding for Designing a Parameter-free Upsampling Module in Arbitrary-scale Image Super-Resolution

Figure 2 for OPE-SR: Orthogonal Position Encoding for Designing a Parameter-free Upsampling Module in Arbitrary-scale Image Super-Resolution

Figure 3 for OPE-SR: Orthogonal Position Encoding for Designing a Parameter-free Upsampling Module in Arbitrary-scale Image Super-Resolution

Figure 4 for OPE-SR: Orthogonal Position Encoding for Designing a Parameter-free Upsampling Module in Arbitrary-scale Image Super-Resolution

Abstract:Implicit neural representation (INR) is a popular approach for arbitrary-scale image super-resolution (SR), as a key component of INR, position encoding improves its representation ability. Motivated by position encoding, we propose orthogonal position encoding (OPE) - an extension of position encoding - and an OPE-Upscale module to replace the INR-based upsampling module for arbitrary-scale image super-resolution. Same as INR, our OPE-Upscale Module takes 2D coordinates and latent code as inputs; however it does not require training parameters. This parameter-free feature allows the OPE-Upscale Module to directly perform linear combination operations to reconstruct an image in a continuous manner, achieving an arbitrary-scale image reconstruction. As a concise SR framework, our method has high computing efficiency and consumes less memory comparing to the state-of-the-art (SOTA), which has been confirmed by extensive experiments and evaluations. In addition, our method has comparable results with SOTA in arbitrary scale image super-resolution. Last but not the least, we show that OPE corresponds to a set of orthogonal basis, justifying our design principle.

* Accepted by CVPR 2023. 11 pages

Via

Access Paper or Ask Questions

Semantic Segmentation by Improved Generative Adversarial Networks

Apr 20, 2021

ZengShun Zhaoa, Yulong Wang, Ke Liu, Haoran Yang, Qian Sun, Heng Qiao

Figure 1 for Semantic Segmentation by Improved Generative Adversarial Networks

Figure 2 for Semantic Segmentation by Improved Generative Adversarial Networks

Figure 3 for Semantic Segmentation by Improved Generative Adversarial Networks

Figure 4 for Semantic Segmentation by Improved Generative Adversarial Networks

Abstract:While most existing segmentation methods usually combined the powerful feature extraction capabilities of CNNs with Conditional Random Fields (CRFs) post-processing, the result always limited by the fault of CRFs . Due to the notoriously slow calculation speeds and poor efficiency of CRFs, in recent years, CRFs post-processing has been gradually eliminated. In this paper, an improved Generative Adversarial Networks (GANs) for image semantic segmentation task (semantic segmentation by GANs, Seg-GAN) is proposed to facilitate further segmentation research. In addition, we introduce Convolutional CRFs (ConvCRFs) as an effective improvement solution for the image semantic segmentation task. Towards the goal of differentiating the segmentation results from the ground truth distribution and improving the details of the output images, the proposed discriminator network is specially designed in a full convolutional manner combined with cascaded ConvCRFs. Besides, the adversarial loss aggressively encourages the output image to be close to the distribution of the ground truth. Our method not only learns an end-to-end mapping from input image to corresponding output image, but also learns a loss function to train this mapping. The experiments show that our method achieves better performance than state-of-the-art methods.

* arXiv admin note: text overlap with arXiv:1802.07934 by other authors

Via

Access Paper or Ask Questions

Superpixel Segmentation with Fully Convolutional Networks

Mar 29, 2020

Fengting Yang, Qian Sun, Hailin Jin, Zihan Zhou

Figure 1 for Superpixel Segmentation with Fully Convolutional Networks

Figure 2 for Superpixel Segmentation with Fully Convolutional Networks

Figure 3 for Superpixel Segmentation with Fully Convolutional Networks

Figure 4 for Superpixel Segmentation with Fully Convolutional Networks

Abstract:In computer vision, superpixels have been widely used as an effective way to reduce the number of image primitives for subsequent processing. But only a few attempts have been made to incorporate them into deep neural networks. One main reason is that the standard convolution operation is defined on regular grids and becomes inefficient when applied to superpixels. Inspired by an initialization strategy commonly adopted by traditional superpixel algorithms, we present a novel method that employs a simple fully convolutional network to predict superpixels on a regular image grid. Experimental results on benchmark datasets show that our method achieves state-of-the-art superpixel segmentation performance while running at about 50fps. Based on the predicted superpixels, we further develop a downsampling/upsampling scheme for deep networks with the goal of generating high-resolution outputs for dense prediction tasks. Specifically, we modify a popular network architecture for stereo matching to simultaneously predict superpixels and disparities. We show that improved disparity estimation accuracy can be obtained on public datasets.

* 16 pages, 15 figures, to be published in CVPR'20

Via

Access Paper or Ask Questions

Stochastic Coordinate Coding and Its Application for Drosophila Gene Expression Pattern Annotation

Dec 09, 2014

Binbin Lin, Qingyang Li, Qian Sun, Ming-Jun Lai, Ian Davidson, Wei Fan, Jieping Ye

Figure 1 for Stochastic Coordinate Coding and Its Application for Drosophila Gene Expression Pattern Annotation

Figure 2 for Stochastic Coordinate Coding and Its Application for Drosophila Gene Expression Pattern Annotation

Figure 3 for Stochastic Coordinate Coding and Its Application for Drosophila Gene Expression Pattern Annotation

Figure 4 for Stochastic Coordinate Coding and Its Application for Drosophila Gene Expression Pattern Annotation

Abstract:\textit{Drosophila melanogaster} has been established as a model organism for investigating the fundamental principles of developmental gene interactions. The gene expression patterns of \textit{Drosophila melanogaster} can be documented as digital images, which are annotated with anatomical ontology terms to facilitate pattern discovery and comparison. The automated annotation of gene expression pattern images has received increasing attention due to the recent expansion of the image database. The effectiveness of gene expression pattern annotation relies on the quality of feature representation. Previous studies have demonstrated that sparse coding is effective for extracting features from gene expression images. However, solving sparse coding remains a computationally challenging problem, especially when dealing with large-scale data sets and learning large size dictionaries. In this paper, we propose a novel algorithm to solve the sparse coding problem, called Stochastic Coordinate Coding (SCC). The proposed algorithm alternatively updates the sparse codes via just a few steps of coordinate descent and updates the dictionary via second order stochastic gradient descent. The computational cost is further reduced by focusing on the non-zero components of the sparse codes and the corresponding columns of the dictionary only in the updating procedure. Thus, the proposed algorithm significantly improves the efficiency and the scalability, making sparse coding applicable for large-scale data sets and large dictionary sizes. Our experiments on Drosophila gene expression data sets demonstrate the efficiency and the effectiveness of the proposed algorithm.

Via

Access Paper or Ask Questions