Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Luxin Zhang

WK-Pnet: FM-Based Positioning via Wavelet Packet Decomposition and Knowledge Distillation

Apr 10, 2025

Shilian Zheng, Quan Lin, Peihan Qi, Luxin Zhang, Xinjiang Qiu, Zhijin Zhao, Xiaoniu Yang

Abstract:Accurate and efficient positioning in complex environments is critical for applications where traditional satellite-based systems face limitations, such as indoors or urban canyons. This paper introduces WK-Pnet, an FM-based indoor positioning framework that combines wavelet packet decomposition (WPD) and knowledge distillation. WK-Pnet leverages WPD to extract rich time-frequency features from FM signals, which are then processed by a deep learning model for precise position estimation. To address computational demands, we employ knowledge distillation, transferring insights from a high-capacity model to a streamlined student model, achieving substantial reductions in complexity without sacrificing accuracy. Experimental results across diverse environments validate WK-Pnet's superior positioning accuracy and lower computational requirements, making it a viable solution for positioning in real-time resource-constraint applications.

Via

Access Paper or Ask Questions

DS-Pnet: FM-Based Positioning via Downsampling

Apr 10, 2025

Shilian Zheng, Xinjiang Qiu, Luxin Zhang, Quan Lin, Zhijin Zhao, Xiaoniu Yang

Abstract:In this paper we present DS-Pnet, a novel framework for FM signal-based positioning that addresses the challenges of high computational complexity and limited deployment in resource-constrained environments. Two downsampling methods-IQ signal downsampling and time-frequency representation downsampling-are proposed to reduce data dimensionality while preserving critical positioning features. By integrating with the lightweight MobileViT-XS neural network, the framework achieves high positioning accuracy with significantly reduced computational demands. Experiments on real-world FM signal datasets demonstrate that DS-Pnet achieves superior performance in both indoor and outdoor scenarios, with space and time complexity reductions of approximately 87% and 99.5%, respectively, compared to an existing method, FM-Pnet. Despite the high compression, DS-Pnet maintains robust positioning accuracy, offering an optimal balance between efficiency and precision.

Via

Access Paper or Ask Questions

Deep Learning-Based Wideband Spectrum Sensing with Dual-Representation Inputs and Subband Shuffling Augmentation

Apr 10, 2025

Shilian Zheng, Zhihao Ye, Luxin Zhang, Keqiang Yue, Zhijin Zhao

Abstract:The widespread adoption of mobile communication technology has led to a severe shortage of spectrum resources, driving the development of cognitive radio technologies aimed at improving spectrum utilization, with spectrum sensing being the key enabler. This paper presents a novel deep learning-based wideband spectrum sensing framework that leverages multi-taper power spectral inputs to achieve high-precision and sample-efficient sensing. To enhance sensing accuracy, we incorporate a feature fusion strategy that combines multiple power spectrum representations. To tackle the challenge of limited sample sizes, we propose two data augmentation techniques designed to expand the training set and improve the network's detection probability. Comprehensive simulation results demonstrate that our method outperforms existing approaches, particularly in low signal-to-noise ratio conditions, achieving higher detection probabilities and lower false alarm rates. The method also exhibits strong robustness across various scenarios, highlighting its significant potential for practical applications in wireless communication systems.

Via

Access Paper or Ask Questions

MoCha: Towards Movie-Grade Talking Character Synthesis

Mar 30, 2025

Cong Wei, Bo Sun, Haoyu Ma, Ji Hou, Felix Juefei-Xu, Zecheng He, Xiaoliang Dai, Luxin Zhang, Kunpeng Li, Tingbo Hou(+3 more)

Abstract:Recent advancements in video generation have achieved impressive motion realism, yet they often overlook character-driven storytelling, a crucial task for automated film, animation generation. We introduce Talking Characters, a more realistic task to generate talking character animations directly from speech and text. Unlike talking head, Talking Characters aims at generating the full portrait of one or more characters beyond the facial region. In this paper, we propose MoCha, the first of its kind to generate talking characters. To ensure precise synchronization between video and speech, we propose a speech-video window attention mechanism that effectively aligns speech and video tokens. To address the scarcity of large-scale speech-labeled video datasets, we introduce a joint training strategy that leverages both speech-labeled and text-labeled video data, significantly improving generalization across diverse character actions. We also design structured prompt templates with character tags, enabling, for the first time, multi-character conversation with turn-based dialogue-allowing AI-generated characters to engage in context-aware conversations with cinematic coherence. Extensive qualitative and quantitative evaluations, including human preference studies and benchmark comparisons, demonstrate that MoCha sets a new standard for AI-generated cinematic storytelling, achieving superior realism, expressiveness, controllability and generalization.

* https://congwei1230.github.io/MoCha/

Via

Access Paper or Ask Questions

Query-Efficient Adversarial Attack Against Vertical Federated Graph Learning

Nov 05, 2024

Jinyin Chen, Wenbo Mu, Luxin Zhang, Guohan Huang, Haibin Zheng, Yao Cheng

Figure 1 for Query-Efficient Adversarial Attack Against Vertical Federated Graph Learning

Figure 2 for Query-Efficient Adversarial Attack Against Vertical Federated Graph Learning

Figure 3 for Query-Efficient Adversarial Attack Against Vertical Federated Graph Learning

Figure 4 for Query-Efficient Adversarial Attack Against Vertical Federated Graph Learning

Abstract:Graph neural network (GNN) has captured wide attention due to its capability of graph representation learning for graph-structured data. However, the distributed data silos limit the performance of GNN. Vertical federated learning (VFL), an emerging technique to process distributed data, successfully makes GNN possible to handle the distributed graph-structured data. Despite the prosperous development of vertical federated graph learning (VFGL), the robustness of VFGL against the adversarial attack has not been explored yet. Although numerous adversarial attacks against centralized GNNs are proposed, their attack performance is challenged in the VFGL scenario. To the best of our knowledge, this is the first work to explore the adversarial attack against VFGL. A query-efficient hybrid adversarial attack framework is proposed to significantly improve the centralized adversarial attacks against VFGL, denoted as NA2, short for Neuron-based Adversarial Attack. Specifically, a malicious client manipulates its local training data to improve its contribution in a stealthy fashion. Then a shadow model is established based on the manipulated data to simulate the behavior of the server model in VFGL. As a result, the shadow model can improve the attack success rate of various centralized attacks with a few queries. Extensive experiments on five real-world benchmarks demonstrate that NA2 improves the performance of the centralized adversarial attacks against VFGL, achieving state-of-the-art performance even under potential adaptive defense where the defender knows the attack method. Additionally, we provide interpretable experiments of the effectiveness of NA2 via sensitive neurons identification and visualization of t-SNE.

Via

Access Paper or Ask Questions

Movie Gen: A Cast of Media Foundation Models

Oct 17, 2024

Adam Polyak, Amit Zohar, Andrew Brown, Andros Tjandra, Animesh Sinha, Ann Lee, Apoorv Vyas, Bowen Shi, Chih-Yao Ma, Ching-Yao Chuang(+78 more)

Figure 1 for Movie Gen: A Cast of Media Foundation Models

Figure 2 for Movie Gen: A Cast of Media Foundation Models

Figure 3 for Movie Gen: A Cast of Media Foundation Models

Figure 4 for Movie Gen: A Cast of Media Foundation Models

Abstract:We present Movie Gen, a cast of foundation models that generates high-quality, 1080p HD videos with different aspect ratios and synchronized audio. We also show additional capabilities such as precise instruction-based video editing and generation of personalized videos based on a user's image. Our models set a new state-of-the-art on multiple tasks: text-to-video synthesis, video personalization, video editing, video-to-audio generation, and text-to-audio generation. Our largest video generation model is a 30B parameter transformer trained with a maximum context length of 73K video tokens, corresponding to a generated video of 16 seconds at 16 frames-per-second. We show multiple technical innovations and simplifications on the architecture, latent spaces, training objectives and recipes, data curation, evaluation protocols, parallelization techniques, and inference optimizations that allow us to reap the benefits of scaling pre-training data, model size, and training compute for training large scale media generation models. We hope this paper helps the research community to accelerate progress and innovation in media generation models. All videos from this paper are available at https://go.fb.me/MovieGenResearchVideos.

Via

Access Paper or Ask Questions

Animated Stickers: Bringing Stickers to Life with Video Diffusion

Feb 08, 2024

David Yan, Winnie Zhang, Luxin Zhang, Anmol Kalia, Dingkang Wang, Ankit Ramchandani, Miao Liu, Albert Pumarola, Edgar Schoenfeld, Elliot Blanchard(+8 more)

Figure 1 for Animated Stickers: Bringing Stickers to Life with Video Diffusion

Figure 2 for Animated Stickers: Bringing Stickers to Life with Video Diffusion

Figure 3 for Animated Stickers: Bringing Stickers to Life with Video Diffusion

Figure 4 for Animated Stickers: Bringing Stickers to Life with Video Diffusion

Abstract:We introduce animated stickers, a video diffusion model which generates an animation conditioned on a text prompt and static sticker image. Our model is built on top of the state-of-the-art Emu text-to-image model, with the addition of temporal layers to model motion. Due to the domain gap, i.e. differences in visual and motion style, a model which performed well on generating natural videos can no longer generate vivid videos when applied to stickers. To bridge this gap, we employ a two-stage finetuning pipeline: first with weakly in-domain data, followed by human-in-the-loop (HITL) strategy which we term ensemble-of-teachers. It distills the best qualities of multiple teachers into a smaller student model. We show that this strategy allows us to specifically target improvements to motion quality while maintaining the style from the static image. With inference optimizations, our model is able to generate an eight-frame video with high-quality, interesting, and relevant motion in under one second.

Via

Access Paper or Ask Questions

AVID: Any-Length Video Inpainting with Diffusion Model

Dec 06, 2023

Zhixing Zhang, Bichen Wu, Xiaoyan Wang, Yaqiao Luo, Luxin Zhang, Yinan Zhao, Peter Vajda, Dimitris Metaxas, Licheng Yu

Figure 1 for AVID: Any-Length Video Inpainting with Diffusion Model

Figure 2 for AVID: Any-Length Video Inpainting with Diffusion Model

Figure 3 for AVID: Any-Length Video Inpainting with Diffusion Model

Figure 4 for AVID: Any-Length Video Inpainting with Diffusion Model

Abstract:Recent advances in diffusion models have successfully enabled text-guided image inpainting. While it seems straightforward to extend such editing capability into video domain, there has been fewer works regarding text-guided video inpainting. Given a video, a masked region at its initial frame, and an editing prompt, it requires a model to do infilling at each frame following the editing guidance while keeping the out-of-mask region intact. There are three main challenges in text-guided video inpainting: ($i$) temporal consistency of the edited video, ($ii$) supporting different inpainting types at different structural fidelity level, and ($iii$) dealing with variable video length. To address these challenges, we introduce Any-Length Video Inpainting with Diffusion Model, dubbed as AVID. At its core, our model is equipped with effective motion modules and adjustable structure guidance, for fixed-length video inpainting. Building on top of that, we propose a novel Temporal MultiDiffusion sampling pipeline with an middle-frame attention guidance mechanism, facilitating the generation of videos with any desired duration. Our comprehensive experiments show our model can robustly deal with various inpainting types at different video duration range, with high quality. More visualization results is made publicly available at https://zhang-zx.github.io/AVID/ .

* Project website: https://zhang-zx.github.io/AVID/

Via

Access Paper or Ask Questions

Augmenting Radio Signals with Wavelet Transform for Deep Learning-Based Modulation Recognition

Nov 07, 2023

Tao Chen, Shilian Zheng, Kunfeng Qiu, Luxin Zhang, Qi Xuan, Xiaoniu Yang

Abstract:The use of deep learning for radio modulation recognition has become prevalent in recent years. This approach automatically extracts high-dimensional features from large datasets, facilitating the accurate classification of modulation schemes. However, in real-world scenarios, it may not be feasible to gather sufficient training data in advance. Data augmentation is a method used to increase the diversity and quantity of training dataset and to reduce data sparsity and imbalance. In this paper, we propose data augmentation methods that involve replacing detail coefficients decomposed by discrete wavelet transform for reconstructing to generate new samples and expand the training set. Different generation methods are used to generate replacement sequences. Simulation results indicate that our proposed methods significantly outperform the other augmentation methods.

Via

Access Paper or Ask Questions

Cloth Region Segmentation for Robust Grasp Selection

Aug 13, 2020

Jianing Qian, Thomas Weng, Luxin Zhang, Brian Okorn, David Held

Figure 1 for Cloth Region Segmentation for Robust Grasp Selection

Figure 2 for Cloth Region Segmentation for Robust Grasp Selection

Figure 3 for Cloth Region Segmentation for Robust Grasp Selection

Figure 4 for Cloth Region Segmentation for Robust Grasp Selection

Abstract:Cloth detection and manipulation is a common task in domestic and industrial settings, yet such tasks remain a challenge for robots due to cloth deformability. Furthermore, in many cloth-related tasks like laundry folding and bed making, it is crucial to manipulate specific regions like edges and corners, as opposed to folds. In this work, we focus on the problem of segmenting and grasping these key regions. Our approach trains a network to segment the edges and corners of a cloth from a depth image, distinguishing such regions from wrinkles or folds. We also provide a novel algorithm for estimating the grasp location, direction, and directional uncertainty from the segmentation. We demonstrate our method on a real robot system and show that it outperforms baseline methods on grasping success. Video and other supplementary materials are available at: https://sites.google.com/view/cloth-segmentation.

* Accepted at IROS 2020. The first two authors contributed equally and are listed in alphabetical order

Via

Access Paper or Ask Questions