Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yibo Zhang

MMWiLoc: A Multi-Sensor Dataset and Robust Device-Free Localization Method Using Commercial Off-The-Shelf Millimeter Wave Wi-Fi Devices

Jun 13, 2025

Wenbo Ding, Yang Li, Dongsheng Wang, Bin Zhao, Yunrong Zhu, Yibo Zhang, Yumeng Miao

Abstract:Device-free Wi-Fi sensing has numerous benefits in practical settings, as it eliminates the requirement for dedicated sensing devices and can be accomplished using current low-cost Wi-Fi devices. With the development of Wi-Fi standards, millimeter wave Wi-Fi devices with 60GHz operating frequency and up to 4GHz bandwidth have become commercially available. Although millimeter wave Wi-Fi presents great promise for Device-Free Wi-Fi sensing with increased bandwidth and beam-forming ability, there still lacks a method for localization using millimeter wave Wi-Fi. Here, we present two major contributions: First, we provide a comprehensive multi-sensor dataset that synchronously captures human movement data from millimeter wave Wi-Fi, 2.4GHz Wi-Fi, and millimeter wave radar sensors. This dataset enables direct performance comparisons across different sensing modalities and facilitates reproducible researches in indoor localization. Second, we introduce MMWiLoc, a novel localization method that achieves centimeter-level precision with low computational cost. MMWiLoc incorporates two components: beam pattern calibration using Expectation Maximization and target localization through Multi-Scale Compression Sensing. The system processes beam Signal-to-Noise Ratio (beamSNR) information from the beam-forming process to determine target Angle of Arrival (AoA), which is then fused across devices for localization. Our extensive evaluation demonstrates that MMWiLoc achieves centimeter-level precision, outperforming 2.4GHz Wi-Fi systems while maintaining competitive performance with high-precision radar systems. The dataset and examples processing code will be released after this paper is accepted at https://github.com/wowoyoho/MMWiLoc.

* 8 pages, 8 figures

Via

Access Paper or Ask Questions

Lifelong Evolution: Collaborative Learning between Large and Small Language Models for Continuous Emergent Fake News Detection

Jun 05, 2025

Ziyi Zhou, Xiaoming Zhang, Litian Zhang, Yibo Zhang, Zhenyu Guan, Chaozhuo Li, Philip S. Yu

Abstract:The widespread dissemination of fake news on social media has significantly impacted society, resulting in serious consequences. Conventional deep learning methodologies employing small language models (SLMs) suffer from extensive supervised training requirements and difficulties adapting to evolving news environments due to data scarcity and distribution shifts. Large language models (LLMs), despite robust zero-shot capabilities, fall short in accurately detecting fake news owing to outdated knowledge and the absence of suitable demonstrations. In this paper, we propose a novel Continuous Collaborative Emergent Fake News Detection (C$^2$EFND) framework to address these challenges. The C$^2$EFND framework strategically leverages both LLMs' generalization power and SLMs' classification expertise via a multi-round collaborative learning framework. We further introduce a lifelong knowledge editing module based on a Mixture-of-Experts architecture to incrementally update LLMs and a replay-based continue learning method to ensure SLMs retain prior knowledge without retraining entirely. Extensive experiments on Pheme and Twitter16 datasets demonstrate that C$^2$EFND significantly outperforms existed methods, effectively improving detection accuracy and adaptability in continuous emergent fake news scenarios.

Via

Access Paper or Ask Questions

DecoupledGaussian: Object-Scene Decoupling for Physics-Based Interaction

Mar 07, 2025

Miaowei Wang, Yibo Zhang, Rui Ma, Weiwei Xu, Changqing Zou, Daniel Morris

Figure 1 for DecoupledGaussian: Object-Scene Decoupling for Physics-Based Interaction

Figure 2 for DecoupledGaussian: Object-Scene Decoupling for Physics-Based Interaction

Figure 3 for DecoupledGaussian: Object-Scene Decoupling for Physics-Based Interaction

Figure 4 for DecoupledGaussian: Object-Scene Decoupling for Physics-Based Interaction

Abstract:We present DecoupledGaussian, a novel system that decouples static objects from their contacted surfaces captured in-the-wild videos, a key prerequisite for realistic Newtonian-based physical simulations. Unlike prior methods focused on synthetic data or elastic jittering along the contact surface, which prevent objects from fully detaching or moving independently, DecoupledGaussian allows for significant positional changes without being constrained by the initial contacted surface. Recognizing the limitations of current 2D inpainting tools for restoring 3D locations, our approach proposes joint Poisson fields to repair and expand the Gaussians of both objects and contacted scenes after separation. This is complemented by a multi-carve strategy to refine the object's geometry. Our system enables realistic simulations of decoupling motions, collisions, and fractures driven by user-specified impulses, supporting complex interactions within and across multiple scenes. We validate DecoupledGaussian through a comprehensive user study and quantitative benchmarks. This system enhances digital interaction with objects and scenes in real-world environments, benefiting industries such as VR, robotics, and autonomous driving. Our project page is at: https://wangmiaowei.github.io/DecoupledGaussian.github.io/.

* CVPR2025 Accepted

Via

Access Paper or Ask Questions

KM-UNet KAN Mamba UNet for medical image segmentation

Jan 05, 2025

Yibo Zhang

Figure 1 for KM-UNet KAN Mamba UNet for medical image segmentation

Figure 2 for KM-UNet KAN Mamba UNet for medical image segmentation

Figure 3 for KM-UNet KAN Mamba UNet for medical image segmentation

Figure 4 for KM-UNet KAN Mamba UNet for medical image segmentation

Abstract:Medical image segmentation is a critical task in medical imaging analysis. Traditional CNN-based methods struggle with modeling long-range dependencies, while Transformer-based models, despite their success, suffer from quadratic computational complexity. To address these limitations, we propose KM-UNet, a novel U-shaped network architecture that combines the strengths of Kolmogorov-Arnold Networks (KANs) and state-space models (SSMs). KM-UNet leverages the Kolmogorov-Arnold representation theorem for efficient feature representation and SSMs for scalable long-range modeling, achieving a balance between accuracy and computational efficiency. We evaluate KM-UNet on five benchmark datasets: ISIC17, ISIC18, CVC, BUSI, and GLAS. Experimental results demonstrate that KM-UNet achieves competitive performance compared to state-of-the-art methods in medical image segmentation tasks. To the best of our knowledge, KM-UNet is the first medical image segmentation framework integrating KANs and SSMs. This work provides a valuable baseline and new insights for the development of more efficient and interpretable medical image segmentation systems. The code is open source at https://github.com/2760613195/KM_UNet Keywords:KAN,Manba, state-space models,UNet, Medical image segmentation, Deep learning

Via

Access Paper or Ask Questions

PositionID: LLMs can Control Lengths, Copy and Paste with Explicit Positional Awareness

Oct 09, 2024

Zekun Wang, Feiyu Duan, Yibo Zhang, Wangchunshu Zhou, Ke Xu, Wenhao Huang, Jie Fu

Abstract:Large Language Models (LLMs) demonstrate impressive capabilities across various domains, including role-playing, creative writing, mathematical reasoning, and coding. Despite these advancements, LLMs still encounter challenges with length control, frequently failing to adhere to specific length constraints due to their token-level operations and insufficient training on data with strict length limitations. We identify this issue as stemming from a lack of positional awareness and propose novel approaches--PositionID Prompting and PositionID Fine-Tuning--to address it. These methods enhance the model's ability to continuously monitor and manage text length during generation. Additionally, we introduce PositionID CP Prompting to enable LLMs to perform copy and paste operations accurately. Furthermore, we develop two benchmarks for evaluating length control and copy-paste abilities. Our experiments demonstrate that our methods significantly improve the model's adherence to length constraints and copy-paste accuracy without compromising response quality.

* 39 pages. CP-Bench and LenCtrl-Bench are available in https://huggingface.co/datasets/ZenMoore/CP-Bench and https://huggingface.co/datasets/ZenMoore/LenCtrl-Bench

Via

Access Paper or Ask Questions

MIO: A Foundation Model on Multimodal Tokens

Sep 26, 2024

Zekun Wang, King Zhu, Chunpu Xu, Wangchunshu Zhou, Jiaheng Liu, Yibo Zhang, Jiashuo Wang, Ning Shi, Siyu Li, Yizhi Li(+7 more)

Figure 1 for MIO: A Foundation Model on Multimodal Tokens

Figure 2 for MIO: A Foundation Model on Multimodal Tokens

Figure 3 for MIO: A Foundation Model on Multimodal Tokens

Figure 4 for MIO: A Foundation Model on Multimodal Tokens

Abstract:In this paper, we introduce MIO, a novel foundation model built on multimodal tokens, capable of understanding and generating speech, text, images, and videos in an end-to-end, autoregressive manner. While the emergence of large language models (LLMs) and multimodal large language models (MM-LLMs) propels advancements in artificial general intelligence through their versatile capabilities, they still lack true any-to-any understanding and generation. Recently, the release of GPT-4o has showcased the remarkable potential of any-to-any LLMs for complex real-world tasks, enabling omnidirectional input and output across images, speech, and text. However, it is closed-source and does not support the generation of multimodal interleaved sequences. To address this gap, we present MIO, which is trained on a mixture of discrete tokens across four modalities using causal multimodal modeling. MIO undergoes a four-stage training process: (1) alignment pre-training, (2) interleaved pre-training, (3) speech-enhanced pre-training, and (4) comprehensive supervised fine-tuning on diverse textual, visual, and speech tasks. Our experimental results indicate that MIO exhibits competitive, and in some cases superior, performance compared to previous dual-modal baselines, any-to-any model baselines, and even modality-specific baselines. Moreover, MIO demonstrates advanced capabilities inherent to its any-to-any feature, such as interleaved video-text generation, chain-of-visual-thought reasoning, visual guideline generation, instructional image editing, etc.

* Technical Report. Codes and models will be available soon

Via

Access Paper or Ask Questions

RAG-Optimized Tibetan Tourism LLMs: Enhancing Accuracy and Personalization

Aug 21, 2024

Jinhu Qi, Shuai Yan, Yibo Zhang, Wentao Zhang, Rong Jin, Yuwei Hu, Ke Wang

Figure 1 for RAG-Optimized Tibetan Tourism LLMs: Enhancing Accuracy and Personalization

Figure 2 for RAG-Optimized Tibetan Tourism LLMs: Enhancing Accuracy and Personalization

Figure 3 for RAG-Optimized Tibetan Tourism LLMs: Enhancing Accuracy and Personalization

Abstract:With the development of the modern social economy, tourism has become an important way to meet people's spiritual needs, bringing development opportunities to the tourism industry. However, existing large language models (LLMs) face challenges in personalized recommendation capabilities and the generation of content that can sometimes produce hallucinations. This study proposes an optimization scheme for Tibet tourism LLMs based on retrieval-augmented generation (RAG) technology. By constructing a database of tourist viewpoints and processing the data using vectorization techniques, we have significantly improved retrieval accuracy. The application of RAG technology effectively addresses the hallucination problem in content generation. The optimized model shows significant improvements in fluency, accuracy, and relevance of content generation. This research demonstrates the potential of RAG technology in the standardization of cultural tourism information and data analysis, providing theoretical and technical support for the development of intelligent cultural tourism service systems.

* Accepted by AIPR 2024

Via

Access Paper or Ask Questions

SARA: Singular-Value Based Adaptive Low-Rank Adaption

Aug 06, 2024

Jihao Gu, Shuai Chen, Zelin Wang, Yibo Zhang, Ping Gong

Abstract:With the increasing number of parameters in large pre-trained models, LoRA as a parameter-efficient fine-tuning(PEFT) method is widely used for not adding inference overhead. The LoRA method assumes that weight changes during fine-tuning can be approximated by low-rank matrices. However, the rank values need to be manually verified to match different downstream tasks, and they cannot accommodate the varying importance of different layers in the model. In this work, we first analyze the relationship between the performance of different layers and their ranks using SVD. Based on this, we design the Singular-Value Based Adaptive Low-Rank Adaption(SARA), which adaptively finds the rank during initialization by performing SVD on the pre-trained weights. Additionally, we explore the Mixture-of-SARA(Mo-SARA), which significantly reduces the number of parameters by fine-tuning only multiple parallel sets of singular values controlled by a router. Extensive experiments on various complex tasks demonstrate the simplicity and parameter efficiency of our methods. They can effectively and adaptively find the most suitable rank for each layer of each model.

Via

Access Paper or Ask Questions

Research on Tibetan Tourism Viewpoints information generation system based on LLM

Jul 18, 2024

Jinhu Qi, Shuai Yan, Wentao Zhang, Yibo Zhang, Zirui Liu, Ke Wang

Figure 1 for Research on Tibetan Tourism Viewpoints information generation system based on LLM

Figure 2 for Research on Tibetan Tourism Viewpoints information generation system based on LLM

Figure 3 for Research on Tibetan Tourism Viewpoints information generation system based on LLM

Figure 4 for Research on Tibetan Tourism Viewpoints information generation system based on LLM

Abstract:Tibet, ensconced within China's territorial expanse, is distinguished by its labyrinthine and heterogeneous topography, a testament to its profound historical heritage, and the cradle of a unique religious ethos. The very essence of these attributes, however, has impeded the advancement of Tibet's tourism service infrastructure, rendering existing smart tourism services inadequate for the region's visitors. This study delves into the ramifications of informational disparities at tourist sites on Tibetan tourism and addresses the challenge of establishing the Large Language Model (LLM) evaluation criteria. It introduces an innovative approach, the DualGen Bridge AI system, employing supervised fine-tuning techniques to bolster model functionality and enhance optimization processes. Furthermore, it pioneers a multi-structured generative results assessment framework. Empirical validation confirms the efficacy of this framework. The study also explores the application of the supervised fine-tuning method within the proprietary DualGen Bridge AI, aimed at refining the generation of tourist site information. The study's findings offer valuable insights for optimizing system performance and provide support and inspiration for the application of LLM technology in Tibet's tourism services and beyond, potentially revolutionizing the smart tourism industry with advanced, tailored information generation capabilities.

* ICWOC 2024

Via

Access Paper or Ask Questions

A Label-Free and Non-Monotonic Metric for Evaluating Denoising in Event Cameras

Jun 13, 2024

Chenyang Shi, Shasha Guo, Boyi Wei, Hanxiao Liu, Yibo Zhang, Ningfang Song, Jing Jin

Figure 1 for A Label-Free and Non-Monotonic Metric for Evaluating Denoising in Event Cameras

Figure 2 for A Label-Free and Non-Monotonic Metric for Evaluating Denoising in Event Cameras

Figure 3 for A Label-Free and Non-Monotonic Metric for Evaluating Denoising in Event Cameras

Figure 4 for A Label-Free and Non-Monotonic Metric for Evaluating Denoising in Event Cameras

Abstract:Event cameras are renowned for their high efficiency due to outputting a sparse, asynchronous stream of events. However, they are plagued by noisy events, especially in low light conditions. Denoising is an essential task for event cameras, but evaluating denoising performance is challenging. Label-dependent denoising metrics involve artificially adding noise to clean sequences, complicating evaluations. Moreover, the majority of these metrics are monotonic, which can inflate scores by removing substantial noise and valid events. To overcome these limitations, we propose the first label-free and non-monotonic evaluation metric, the area of the continuous contrast curve (AOCC), which utilizes the area enclosed by event frame contrast curves across different time intervals. This metric is inspired by how events capture the edge contours of scenes or objects with high temporal resolution. An effective denoising method removes noise without eliminating these edge-contour events, thus preserving the contrast of event frames. Consequently, contrast across various time ranges serves as a metric to assess denoising effectiveness. As the time interval lengthens, the curve will initially rise and then fall. The proposed metric is validated through both theoretical and experimental evidence.

Via

Access Paper or Ask Questions