Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xun Zhang

Data-Efficient Brushstroke Generation with Diffusion Models for Oil Painting

Mar 01, 2026

Dantong Qin, Alessandro Bozzon, Xian Yang, Xun Zhang, Yike Guo, Pan Wang

Abstract:Many creative multimedia systems are built upon visual primitives such as strokes or textures, which are difficult to collect at scale and fundamentally different from natural image data. This data scarcity makes it challenging for modern generative models to learn expressive and controllable primitives, limiting their use in process-aware content creation. In this work, we study the problem of learning human-like brushstroke generation from a small set of hand-drawn samples (n=470) and propose StrokeDiff, a diffusion-based framework with Smooth Regularization (SmR). SmR injects stochastic visual priors during training, providing a simple mechanism to stabilize diffusion models under sparse supervision without altering the inference process. We further show how the learned primitives can be made controllable through a Bézier-based conditioning module and integrated into a complete stroke-based painting pipeline, including prediction, generation, ordering, and compositing. This demonstrates how data-efficient primitive modeling can support expressive and structured multimedia content creation. Experiments indicate that the proposed approach produces diverse and structurally coherent brushstrokes and enables paintings with richer texture and layering, validated by both automatic metrics and human evaluation.

Via

Access Paper or Ask Questions

AgentRob: From Virtual Forum Agents to Hijacked Physical Robots

Feb 14, 2026

Wenrui Liu, Yaxuan Wang, Xun Zhang, Yanshu Wang, Jiashen Wei, Yifan Xiang, Yuhang Wang, Mingshen Ye, Elsie Dai, Zhiqi Liu(+6 more)

Abstract:Large Language Model (LLM)-powered autonomous agents have demonstrated significant capabilities in virtual environments, yet their integration with the physical world remains narrowly confined to direct control interfaces. We present AgentRob, a framework that bridges online community forums, LLM-powered agents, and physical robots through the Model Context Protocol (MCP). AgentRob enables a novel paradigm where autonomous agents participate in online forums--reading posts, extracting natural language commands, dispatching physical robot actions, and reporting results back to the community. The system comprises three layers: a Forum Layer providing asynchronous, persistent, multi-agent interaction; an Agent Layer with forum agents that poll for @mention-targeted commands; and a Robot Layer with VLM-driven controllers and Unitree Go2/G1 hardware that translate commands into robot primitives via iterative tool calling. The framework supports multiple concurrent agents with distinct identities and physical embodiments coexisting in the same forum, establishing the feasibility of forum-mediated multi-agent robot orchestration.

* 10 pages, 2 figures

Via

Access Paper or Ask Questions

Internalizing LLM Reasoning via Discovery and Replay of Latent Actions

Feb 04, 2026

Zhenning Shi, Yijia Zhu, Junhan Shi, Xun Zhang, Lei Wang, Congcong Miao

Abstract:The internalization of chain-of-thought processes into hidden states has emerged as a highly efficient paradigm for scaling test-time compute. However, existing activation steering methods rely on static control vectors that fail to adapt to the non-stationary evolution of complex reasoning tasks. To address this limitation, we propose STIR (Self-Distilled Tools for Internal Reasoning), a framework that reformulates reasoning enhancement as a dynamic latent trajectory control problem. STIR introduces a synergistic three-stage pipeline: (1) differential intrinsic action induction harvests latent reasoning successes to crystallize steering primitives; (2) sparse control basis construction curates a compact, geometrically diverse tool library; and (3) value-modulated trajectory intervention dynamically injects context-specific impulses via anchor-based gating. Extensive experiments on six arithmetic and logical benchmarks across four representative models demonstrate that STIR improves average accuracy by 1.9% to 7.5% while reducing average token consumption by up to 35% compared to vanilla decoding. These findings demonstrate that the benefits of explicit chain-of-thought can be realized through dynamic latent trajectory control, internalizing the reasoning process to bypass the explicit generation while achieving superior fidelity. Our code is available at https://github.com/sznnzs/LLM-Latent-Action.

Via

Access Paper or Ask Questions

Q-DiT4SR: Exploration of Detail-Preserving Diffusion Transformer Quantization for Real-World Image Super-Resolution

Feb 01, 2026

Xun Zhang, Kaicheng Yang, Hongliang Lu, Haotong Qin, Yong Guo, Yulun Zhang

Abstract:Recently, Diffusion Transformers (DiTs) have emerged in Real-World Image Super-Resolution (Real-ISR) to generate high-quality textures, yet their heavy inference burden hinders real-world deployment. While Post-Training Quantization (PTQ) is a promising solution for acceleration, existing methods in super-resolution mostly focus on U-Net architectures, whereas generic DiT quantization is typically designed for text-to-image tasks. Directly applying these methods to DiT-based super-resolution models leads to severe degradation of local textures. Therefore, we propose Q-DiT4SR, the first PTQ framework specifically tailored for DiT-based Real-ISR. We propose H-SVD, a hierarchical SVD that integrates a global low-rank branch with a local block-wise rank-1 branch under a matched parameter budget. We further propose Variance-aware Spatio-Temporal Mixed Precision: VaSMP allocates cross-layer weight bit-widths in a data-free manner based on rate-distortion theory, while VaTMP schedules intra-layer activation precision across diffusion timesteps via dynamic programming (DP) with minimal calibration. Experiments on multiple real-world datasets demonstrate that our Q-DiT4SR achieves SOTA performance under both W4A6 and W4A4 settings. Notably, the W4A4 quantization configuration reduces model size by 5.8$\times$ and computational operations by over 60$\times$. Our code and models will be available at https://github.com/xunzhang1128/Q-DiT4SR.

* Our code and models will be available at https://github.com/xunzhang1128/Q-DiT4SR

Via

Access Paper or Ask Questions

PIS: A Generalized Physical Inversion Solver for Arbitrary Sparse Observations via Set-Conditioned Diffusion

Dec 14, 2025

Weijie Yang, Xun Zhang

Abstract:Estimation of PDE-constrained physical parameters from limited indirect measurements is inherently ill-posed, particularly when observations are sparse, irregular, and constrained by real-world sensor placement. This challenge is ubiquitous in fields such as fluid mechanics, seismic inversion, and structural health monitoring. Existing deep and operator-learning models collapse under these conditions: fixed-grid assumptions fail, reconstruction deteriorates sharply, and inversion becomes unreliable with limited robustness and no uncertainty quantification (UQ).We propose the Physical Inversion Solver (PIS), a set-conditioned diffusion framework enabling inversion from truly arbitrary observation sets. PIS employs a Set Transformer-based encoder to handle measurements of any number or geometry, and a cosine-annealed sparsity curriculum for exceptional robustness. An accompanying information-theoretic analysis provides insight into the limits of inversion under extreme sparsity by revealing how observation entropy varies across physical systems.PIS is evaluated on three challenging PDE inverse problems: Darcy flow, wavefield inversion (Helmholtz), and structural health monitoring (Hooke's Law). Across all tasks and sparsity regimes -- including extreme cases with an observation rate of only $0.29\%$ -- existing operator-learning baselines fail to reconstruct meaningful fields, often diverging or collapsing entirely.In stark contrast, PIS remains stable and accurate, reducing inversion error by $12.28\%$--$88.73\%$ and reliably producing calibrated posterior samples. These samples accurately reflect both data scarcity and intrinsic physical ambiguity. These results position PIS as a powerful, general-purpose, and uniquely sparsity-resilient solution for physical inversion under arbitrary and severely undersampled observations.

Via

Access Paper or Ask Questions

Transfer Learning for VLC-based indoor Localization: Addressing Environmental Variability

Jul 02, 2025

Masood Jan, Wafa Njima, Xun Zhang, Alexander Artemenko

Abstract:Accurate indoor localization is crucial in industrial environments. Visible Light Communication (VLC) has emerged as a promising solution, offering high accuracy, energy efficiency, and minimal electromagnetic interference. However, VLC-based indoor localization faces challenges due to environmental variability, such as lighting fluctuations and obstacles. To address these challenges, we propose a Transfer Learning (TL)-based approach for VLC-based indoor localization. Using real-world data collected at a BOSCH factory, the TL framework integrates a deep neural network (DNN) to improve localization accuracy by 47\%, reduce energy consumption by 32\%, and decrease computational time by 40\% compared to the conventional models. The proposed solution is highly adaptable under varying environmental conditions and achieves similar accuracy with only 30\% of the dataset, making it a cost-efficient and scalable option for industrial applications in Industry 4.0.

* Accepted for publication in the IEEE VTC2025-Spring Conference, 7 pages

Via

Access Paper or Ask Questions

A Privacy-Preserving Indoor Localization System based on Hierarchical Federated Learning

Jul 02, 2025

Masood Jan, Wafa Njima, Xun Zhang

Abstract:Location information serves as the fundamental element for numerous Internet of Things (IoT) applications. Traditional indoor localization techniques often produce significant errors and raise privacy concerns due to centralized data collection. In response, Machine Learning (ML) techniques offer promising solutions by capturing indoor environment variations. However, they typically require central data aggregation, leading to privacy, bandwidth, and server reliability issues. To overcome these challenges, in this paper, we propose a Federated Learning (FL)-based approach for dynamic indoor localization using a Deep Neural Network (DNN) model. Experimental results show that FL has the nearby performance to Centralized Model (CL) while keeping the data privacy, bandwidth efficiency and server reliability. This research demonstrates that our proposed FL approach provides a viable solution for privacy-enhanced indoor localization, paving the way for advancements in secure and efficient indoor localization systems.

Via

Access Paper or Ask Questions

Beyond Static Testbeds: An Interaction-Centric Agent Simulation Platform for Dynamic Recommender Systems

May 22, 2025

Song Jin, Juntian Zhang, Yuhan Liu, Xun Zhang, Yufei Zhang, Guojun Yin, Fei Jiang, Wei Lin, Rui Yan

Abstract:Evaluating and iterating upon recommender systems is crucial, yet traditional A/B testing is resource-intensive, and offline methods struggle with dynamic user-platform interactions. While agent-based simulation is promising, existing platforms often lack a mechanism for user actions to dynamically reshape the environment. To bridge this gap, we introduce RecInter, a novel agent-based simulation platform for recommender systems featuring a robust interaction mechanism. In RecInter platform, simulated user actions (e.g., likes, reviews, purchases) dynamically update item attributes in real-time, and introduced Merchant Agents can reply, fostering a more realistic and evolving ecosystem. High-fidelity simulation is ensured through Multidimensional User Profiling module, Advanced Agent Architecture, and LLM fine-tuned on Chain-of-Thought (CoT) enriched interaction data. Our platform achieves significantly improved simulation credibility and successfully replicates emergent phenomena like Brand Loyalty and the Matthew Effect. Experiments demonstrate that this interaction mechanism is pivotal for simulating realistic system evolution, establishing our platform as a credible testbed for recommender systems research.

Via

Access Paper or Ask Questions

Optic Fingerprint(OFP): Enhancing Security in Li-Fi Networks

Apr 17, 2025

Ziqi Liu, Xuanbang Chen, Xun Zhang

Abstract:We present a hardware-integrated security framework for LiFi networks through device fingerprint extraction within the IEEE 802.15.7 protocol. Our Optic Fingerprint (OFP) model utilizes inherent LED nonlinearities to generate amplitude-based feature vectors in time and frequency domains, specifically designed for optical wireless systems. Experimental results with 39 commercial LEDs demonstrate 90.36% classification accuracy across SNR 10-30 dB while maintaining standard compliance, offering a practical physical-layer authentication solution for visible light communication.

* 6 pages, Infocom2025

Via

Access Paper or Ask Questions

Extremely Simple Out-of-distribution Detection for Audio-visual Generalized Zero-shot Learning

Mar 28, 2025

Yang Liu, Xun Zhang, Jiale Du, Xinbo Gao, Jungong Han

Abstract:Zero-shot Learning(ZSL) attains knowledge transfer from seen classes to unseen classes by exploring auxiliary category information, which is a promising yet difficult research topic. In this field, Audio-Visual Generalized Zero-Shot Learning~(AV-GZSL) has aroused researchers' great interest in which intricate relations within triple modalities~(audio, video, and natural language) render this task quite challenging but highly research-worthy. However, both existing embedding-based and generative-based AV-GZSL methods tend to suffer from domain shift problem a lot and we propose an extremely simple Out-of-distribution~(OOD) detection based AV-GZSL method~(EZ-AVOOD) to further mitigate bias problem by differentiating seen and unseen samples at the initial beginning. EZ-AVOOD accomplishes effective seen-unseen separation by exploiting the intrinsic discriminative information held in class-specific logits and class-agnostic feature subspace without training an extra OOD detector network. Followed by seen-unseen binary classification, we employ two expert models to classify seen samples and unseen samples separately. Compared to existing state-of-the-art methods, our model achieves superior ZSL and GZSL performances on three audio-visual datasets and becomes the new SOTA, which comprehensively demonstrates the effectiveness of the proposed EZ-AVOOD.

Via

Access Paper or Ask Questions