Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Boyu Yang

DualAttWaveNet: Multiscale Attention Networks for Satellite Interference Detection

Apr 24, 2025

Chunyu Yang, Boyu Yang, Kun Qiu, Zhe Chen, Yue Gao

Abstract:The escalating overlap between non-geostationary orbit (NGSO) and geostationary orbit (GSO) satellite frequency allocations necessitates accurate interference detection methods that address two pivotal technical gaps: computationally efficient signal analysis for real-time operation, and robust anomaly discrimination under varying interference patterns. Existing deep learning approaches employ encoder-decoder anomaly detectors that threshold input-output discrepancies for robustness. While the transformer-based TrID model achieves state-of-the-art performance (AUC: 0.8318, F1: 0.8321), its multi-head attention incurs prohibitive computation time, and its decoupled training of time-frequency models overlooks cross-domain dependencies. To overcome these problems, we propose DualAttWaveNet. A bidirectional attention fusion layer dynamically correlates time-domain samples using parameter-efficient cross-attention routing. A wavelet-regularized reconstruction loss enforces multi-scale consistency. We train the model on public dataset which consists of 48 hours of satellite signals. Experiments show that compared to TrID, DualAttWaveNet improves AUC by 12% and reduces inference time by 50% to 540ms per batch while maintaining F1-score.

Via

Access Paper or Ask Questions

Explicit and Implicit Representations in AI-based 3D Reconstruction for Radiology: A systematic literature review

Apr 15, 2025

Yuezhe Yang, Boyu Yang, Yaqian Wang, Yang He, Xingbo Dong, Zhe Jin

Abstract:The demand for high-quality medical imaging in clinical practice and assisted diagnosis has made 3D reconstruction in radiological imaging a key research focus. Artificial intelligence (AI) has emerged as a promising approach to enhancing reconstruction accuracy while reducing acquisition and processing time, thereby minimizing patient radiation exposure and discomfort and ultimately benefiting clinical diagnosis. This review explores state-of-the-art AI-based 3D reconstruction algorithms in radiological imaging, categorizing them into explicit and implicit approaches based on their underlying principles. Explicit methods include point-based, volume-based, and Gaussian representations, while implicit methods encompass implicit prior embedding and neural radiance fields. Additionally, we examine commonly used evaluation metrics and benchmark datasets. Finally, we discuss the current state of development, key challenges, and future research directions in this evolving field. Our project available on: https://github.com/Bean-Young/AI4Med.

* 43 pages, 5 figures, submit to Medical Image Analysis

Via

Access Paper or Ask Questions

ClawMachine: Fetching Visual Tokens as An Entity for Referring and Grounding

Jun 17, 2024

Tianren Ma, Lingxi Xie, Yunjie Tian, Boyu Yang, Yuan Zhang, David Doermann, Qixiang Ye

Abstract:An essential topic for multimodal large language models (MLLMs) is aligning vision and language concepts at a finer level. In particular, we devote efforts to encoding visual referential information for tasks such as referring and grounding. Existing methods, including proxy encoding and geometry encoding, incorporate additional syntax to encode the object's location, bringing extra burdens in training MLLMs to communicate between language and vision. This study presents ClawMachine, offering a new methodology that notates an entity directly using the visual tokens. It allows us to unify the prompt and answer of visual referential tasks without additional syntax. Upon a joint vision-language vocabulary, ClawMachine unifies visual referring and grounding into an auto-regressive format and learns with a decoder-only architecture. Experiments validate that our model achieves competitive performance across visual referring and grounding tasks with a reduced demand for training data. Additionally, ClawMachine demonstrates a native ability to integrate multi-source information for complex visual reasoning, which prior MLLMs can hardly perform without specific adaptions.

* Project page: https://github.com/martian422/ClawMachine

Via

Access Paper or Ask Questions

Tackling Non-Stationarity in Reinforcement Learning via Causal-Origin Representation

Jun 05, 2023

Wanpeng Zhang, Yilin Li, Boyu Yang, Zongqing Lu

Figure 1 for Tackling Non-Stationarity in Reinforcement Learning via Causal-Origin Representation

Figure 2 for Tackling Non-Stationarity in Reinforcement Learning via Causal-Origin Representation

Figure 3 for Tackling Non-Stationarity in Reinforcement Learning via Causal-Origin Representation

Figure 4 for Tackling Non-Stationarity in Reinforcement Learning via Causal-Origin Representation

Abstract:In real-world scenarios, the application of reinforcement learning is significantly challenged by complex non-stationarity. Most existing methods attempt to model the changes of the environment explicitly, often requiring impractical prior knowledge. In this paper, we propose a new perspective, positing that non-stationarity can propagate and accumulate through complex causal relationships during state transitions, thereby compounding its sophistication and affecting policy learning. We believe that this challenge can be more effectively addressed by tracing the causal origin of non-stationarity. To this end, we introduce the Causal-Origin REPresentation (COREP) algorithm. COREP primarily employs a guided updating mechanism to learn a stable graph representation for states termed as causal-origin representation. By leveraging this representation, the learned policy exhibits impressive resilience to non-stationarity. We supplement our approach with a theoretical analysis grounded in the causal interpretation for non-stationary reinforcement learning, advocating for the validity of the causal-origin representation. Experimental results further demonstrate the superior performance of COREP over existing methods in tackling non-stationarity.

Via

Access Paper or Ask Questions

Learnable Distribution Calibration for Few-Shot Class-Incremental Learning

Oct 01, 2022

Binghao Liu, Boyu Yang, Lingxi Xie, Ren Wang, Qi Tian, Qixiang Ye

Figure 1 for Learnable Distribution Calibration for Few-Shot Class-Incremental Learning

Figure 2 for Learnable Distribution Calibration for Few-Shot Class-Incremental Learning

Figure 3 for Learnable Distribution Calibration for Few-Shot Class-Incremental Learning

Figure 4 for Learnable Distribution Calibration for Few-Shot Class-Incremental Learning

Abstract:Few-shot class-incremental learning (FSCIL) faces challenges of memorizing old class distributions and estimating new class distributions given few training samples. In this study, we propose a learnable distribution calibration (LDC) approach, with the aim to systematically solve these two challenges using a unified framework. LDC is built upon a parameterized calibration unit (PCU), which initializes biased distributions for all classes based on classifier vectors (memory-free) and a single covariance matrix. The covariance matrix is shared by all classes, so that the memory costs are fixed. During base training, PCU is endowed with the ability to calibrate biased distributions by recurrently updating sampled features under the supervision of real distributions. During incremental learning, PCU recovers distributions for old classes to avoid `forgetting', as well as estimating distributions and augmenting samples for new classes to alleviate `over-fitting' caused by the biased distributions of few-shot samples. LDC is theoretically plausible by formatting a variational inference procedure. It improves FSCIL's flexibility as the training procedure requires no class similarity priori. Experiments on CUB200, CIFAR100, and mini-ImageNet datasets show that LDC outperforms the state-of-the-arts by 4.64%, 1.98%, and 3.97%, respectively. LDC's effectiveness is also validated on few-shot learning scenarios.

Via

Access Paper or Ask Questions

Learnable Expansion-and-Compression Network for Few-shot Class-Incremental Learning

Apr 06, 2021

Boyu Yang, Mingbao Lin, Binghao Liu, Mengying Fu, Chang Liu, Rongrong Ji, Qixiang Ye

Figure 1 for Learnable Expansion-and-Compression Network for Few-shot Class-Incremental Learning

Figure 2 for Learnable Expansion-and-Compression Network for Few-shot Class-Incremental Learning

Figure 3 for Learnable Expansion-and-Compression Network for Few-shot Class-Incremental Learning

Figure 4 for Learnable Expansion-and-Compression Network for Few-shot Class-Incremental Learning

Abstract:Few-shot class-incremental learning (FSCIL), which targets at continuously expanding model's representation capacity under few supervisions, is an important yet challenging problem. On the one hand, when fitting new tasks (novel classes), features trained on old tasks (old classes) could significantly drift, causing catastrophic forgetting. On the other hand, training the large amount of model parameters with few-shot novel-class examples leads to model over-fitting. In this paper, we propose a learnable expansion-and-compression network (LEC-Net), with the aim to simultaneously solve catastrophic forgetting and model over-fitting problems in a unified framework. By tentatively expanding network nodes, LEC-Net enlarges the representation capacity of features, alleviating feature drift of old network from the perspective of model regularization. By compressing the expanded network nodes, LEC-Net purses minimal increase of model parameters, alleviating over-fitting of the expanded network from a perspective of compact representation. Experiments on the CUB/CIFAR-100 datasets show that LEC-Net improves the baseline by 5~7% while outperforms the state-of-the-art by 5~6%. LEC-Net also demonstrates the potential to be a general incremental learning approach with dynamic model expansion capability.

Via

Access Paper or Ask Questions

Beyond Max-Margin: Class Margin Equilibrium for Few-shot Object Detection

Mar 10, 2021

Bohao Li, Boyu Yang, Chang Liu, Feng Liu, Rongrong Ji, Qixiang Ye

Figure 1 for Beyond Max-Margin: Class Margin Equilibrium for Few-shot Object Detection

Figure 2 for Beyond Max-Margin: Class Margin Equilibrium for Few-shot Object Detection

Figure 3 for Beyond Max-Margin: Class Margin Equilibrium for Few-shot Object Detection

Figure 4 for Beyond Max-Margin: Class Margin Equilibrium for Few-shot Object Detection

Abstract:Few-shot object detection has made substantial progressby representing novel class objects using the feature representation learned upon a set of base class objects. However,an implicit contradiction between novel class classification and representation is unfortunately ignored. On the one hand, to achieve accurate novel class classification, the distributions of either two base classes must be far away fromeach other (max-margin). On the other hand, to precisely represent novel classes, the distributions of base classes should be close to each other to reduce the intra-class distance of novel classes (min-margin). In this paper, we propose a class margin equilibrium (CME) approach, with the aim to optimize both feature space partition and novel class reconstruction in a systematic way. CME first converts the few-shot detection problem to the few-shot classification problem by using a fully connected layer to decouple localization features. CME then reserves adequate margin space for novel classes by introducing simple-yet-effective class margin loss during feature learning. Finally, CME pursues margin equilibrium by disturbing the features of novel class instances in an adversarial min-max fashion. Experiments on Pascal VOC and MS-COCO datasets show that CME significantly improves upon two baseline detectors (up to 3 ∼ 5% in average), achieving state-of-the-art performance. Code is available at https://github.com/Bohao-Lee/CME .

* This paper has been modified by the author due to errors

Via

Access Paper or Ask Questions

Prototype Mixture Models for Few-shot Semantic Segmentation

Sep 01, 2020

Boyu Yang, Chang Liu, Bohao Li, Jianbin Jiao, Qixiang Ye

Figure 1 for Prototype Mixture Models for Few-shot Semantic Segmentation

Figure 2 for Prototype Mixture Models for Few-shot Semantic Segmentation

Figure 3 for Prototype Mixture Models for Few-shot Semantic Segmentation

Figure 4 for Prototype Mixture Models for Few-shot Semantic Segmentation

Abstract:Few-shot segmentation is challenging because objects within the support and query images could significantly differ in appearance and pose. Using a single prototype acquired directly from the support image to segment the query image causes semantic ambiguity. In this paper, we propose prototype mixture models (PMMs), which correlate diverse image regions with multiple prototypes to enforce the prototype-based semantic representation. Estimated by an Expectation-Maximization algorithm, PMMs incorporate rich channel-wised and spatial semantics from limited support images. Utilized as representations as well as classifiers, PMMs fully leverage the semantics to activate objects in the query image while depressing background regions in a duplex manner. Extensive experiments on Pascal VOC and MS-COCO datasets show that PMMs significantly improve upon state-of-the-arts. Particularly, PMMs improve 5-shot segmentation performance on MS-COCO by up to 5.82\% with only a moderate cost for model size and inference speed.

Via

Access Paper or Ask Questions