Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhixiang Wang

Sekai: A Video Dataset towards World Exploration

Jun 18, 2025

Zhen Li, Chuanhao Li, Xiaofeng Mao, Shaoheng Lin, Ming Li, Shitian Zhao, Zhaopan Xu, Xinyue Li, Yukang Feng, Jianwen Sun(+10 more)

Abstract:Video generation techniques have made remarkable progress, promising to be the foundation of interactive world exploration. However, existing video generation datasets are not well-suited for world exploration training as they suffer from some limitations: limited locations, short duration, static scenes, and a lack of annotations about exploration and the world. In this paper, we introduce Sekai (meaning ``world'' in Japanese), a high-quality first-person view worldwide video dataset with rich annotations for world exploration. It consists of over 5,000 hours of walking or drone view (FPV and UVA) videos from over 100 countries and regions across 750 cities. We develop an efficient and effective toolbox to collect, pre-process and annotate videos with location, scene, weather, crowd density, captions, and camera trajectories. Experiments demonstrate the quality of the dataset. And, we use a subset to train an interactive video world exploration model, named YUME (meaning ``dream'' in Japanese). We believe Sekai will benefit the area of video generation and world exploration, and motivate valuable applications.

* 12 pages, 6 figures

Via

Access Paper or Ask Questions

Foundation Model Insights and a Multi-Model Approach for Superior Fine-Grained One-shot Subset Selection

Jun 17, 2025

Zhijing Wan, Zhixiang Wang, Zheng Wang, Xin Xu, Shin'ichi Satoh

Abstract:One-shot subset selection serves as an effective tool to reduce deep learning training costs by identifying an informative data subset based on the information extracted by an information extractor (IE). Traditional IEs, typically pre-trained on the target dataset, are inherently dataset-dependent. Foundation models (FMs) offer a promising alternative, potentially mitigating this limitation. This work investigates two key questions: (1) Can FM-based subset selection outperform traditional IE-based methods across diverse datasets? (2) Do all FMs perform equally well as IEs for subset selection? Extensive experiments uncovered surprising insights: FMs consistently outperform traditional IEs on fine-grained datasets, whereas their advantage diminishes on coarse-grained datasets with noisy labels. Motivated by these finding, we propose RAM-APL (RAnking Mean-Accuracy of Pseudo-class Labels), a method tailored for fine-grained image datasets. RAM-APL leverages multiple FMs to enhance subset selection by exploiting their complementary strengths. Our approach achieves state-of-the-art performance on fine-grained datasets, including Oxford-IIIT Pet, Food-101, and Caltech-UCSD Birds-200-2011.

* 18 pages, 10 figures, accepted by ICML 2025

Via

Access Paper or Ask Questions

Spatial Transcriptomics Expression Prediction from Histopathology Based on Cross-Modal Mask Reconstruction and Contrastive Learning

Jun 10, 2025

Junzhuo Liu, Markus Eckstein, Zhixiang Wang, Friedrich Feuerhake, Dorit Merhof

Abstract:Spatial transcriptomics is a technology that captures gene expression levels at different spatial locations, widely used in tumor microenvironment analysis and molecular profiling of histopathology, providing valuable insights into resolving gene expression and clinical diagnosis of cancer. Due to the high cost of data acquisition, large-scale spatial transcriptomics data remain challenging to obtain. In this study, we develop a contrastive learning-based deep learning method to predict spatially resolved gene expression from whole-slide images. Evaluation across six different disease datasets demonstrates that, compared to existing studies, our method improves Pearson Correlation Coefficient (PCC) in the prediction of highly expressed genes, highly variable genes, and marker genes by 6.27%, 6.11%, and 11.26% respectively. Further analysis indicates that our method preserves gene-gene correlations and applies to datasets with limited samples. Additionally, our method exhibits potential in cancer tissue localization based on biomarker expression.

* 20 pages, 7 figures

Via

Access Paper or Ask Questions

Optimal Brain Iterative Merging: Mitigating Interference in LLM Merging

Feb 17, 2025

Zhixiang Wang, Zhenyu Mao, Yixuan Qiao, Yunfang Wu, Biye Li

Figure 1 for Optimal Brain Iterative Merging: Mitigating Interference in LLM Merging

Figure 2 for Optimal Brain Iterative Merging: Mitigating Interference in LLM Merging

Figure 3 for Optimal Brain Iterative Merging: Mitigating Interference in LLM Merging

Figure 4 for Optimal Brain Iterative Merging: Mitigating Interference in LLM Merging

Abstract:Large Language Models (LLMs) have demonstrated impressive capabilities, but their high computational costs pose challenges for customization. Model merging offers a cost-effective alternative, yet existing methods suffer from interference among parameters, leading to performance degradation. In this work, we propose Optimal Brain Iterative Merging (OBIM), a novel method designed to mitigate both intra-model and inter-model interference. OBIM consists of two key components: (1) A saliency measurement mechanism that evaluates parameter importance based on loss changes induced by individual weight alterations, reducing intra-model interference by preserving only high-saliency parameters. (2) A mutually exclusive iterative merging framework, which incrementally integrates models using a binary mask to avoid direct parameter averaging, thereby mitigating inter-model interference. We validate OBIM through experiments on both Supervised Fine-Tuned (SFT) models and post-pretrained checkpoints. The results show that OBIM significantly outperforms existing merging techniques. Overall, OBIM provides an effective and practical solution for enhancing LLM merging.

Via

Access Paper or Ask Questions

Continuous Gaussian Process Pre-Optimization for Asynchronous Event-Inertial Odometry

Dec 12, 2024

Zhixiang Wang, Xudong Li, Yizhai Zhang, Fan Zhang, Panfeng Huang

Abstract:Event cameras, as bio-inspired sensors, are asynchronously triggered with high-temporal resolution compared to intensity cameras. Recent work has focused on fusing the event measurements with inertial measurements to enable ego-motion estimation in high-speed and HDR environments. However, existing methods predominantly rely on IMU preintegration designed mainly for synchronous sensors and discrete-time frameworks. In this paper, we propose a continuous-time preintegration method based on the Temporal Gaussian Process (TGP) called GPO. Concretely, we model the preintegration as a time-indexed motion trajectory and leverage an efficient two-step optimization to initialize the precision preintegration pseudo-measurements. Our method realizes a linear and constant time cost for initialization and query, respectively. To further validate the proposal, we leverage the GPO to design an asynchronous event-inertial odometry and compare with other asynchronous fusion schemes within the same odometry system. Experiments conducted on both public and own-collected datasets demonstrate that the proposed GPO offers significant advantages in terms of precision and efficiency, outperforming existing approaches in handling asynchronous sensor fusion.

* 8pages

Via

Access Paper or Ask Questions

Asynchronous Event-Inertial Odometry using a Unified Gaussian Process Regression Framework

Dec 04, 2024

Xudong Li, Zhixiang Wang, Zihao Liu, Yizhai Zhang, Fan Zhang, Xiuming Yao, Panfeng Huang

Figure 1 for Asynchronous Event-Inertial Odometry using a Unified Gaussian Process Regression Framework

Figure 2 for Asynchronous Event-Inertial Odometry using a Unified Gaussian Process Regression Framework

Figure 3 for Asynchronous Event-Inertial Odometry using a Unified Gaussian Process Regression Framework

Figure 4 for Asynchronous Event-Inertial Odometry using a Unified Gaussian Process Regression Framework

Abstract:Recent works have combined monocular event camera and inertial measurement unit to estimate the $SE(3)$ trajectory. However, the asynchronicity of event cameras brings a great challenge to conventional fusion algorithms. In this paper, we present an asynchronous event-inertial odometry under a unified Gaussian Process (GP) regression framework to naturally fuse asynchronous data associations and inertial measurements. A GP latent variable model is leveraged to build data-driven motion prior and acquire the analytical integration capacity. Then, asynchronous event-based feature associations and integral pseudo measurements are tightly coupled using the same GP framework. Subsequently, this fusion estimation problem is solved by underlying factor graph in a sliding-window manner. With consideration of sparsity, those historical states are marginalized orderly. A twin system is also designed for comparison, where the traditional inertial preintegration scheme is embedded in the GP-based framework to replace the GP latent variable model. Evaluations on public event-inertial datasets demonstrate the validity of both systems. Comparison experiments show competitive precision compared to the state-of-the-art synchronous scheme.

* Accepted at IEEE IROS 2024

Via

Access Paper or Ask Questions

DiffPatch: Generating Customizable Adversarial Patches using Diffusion Model

Dec 02, 2024

Zhixiang Wang, Guangnan Ye, Xiaosen Wang, Siheng Chen, Zhibo Wang, Xingjun Ma, Yu-Gang Jiang

Abstract:Physical adversarial patches printed on clothing can easily allow individuals to evade person detectors. However, most existing adversarial patch generation methods prioritize attack effectiveness over stealthiness, resulting in patches that are aesthetically unpleasing. Although existing methods using generative adversarial networks or diffusion models can produce more natural-looking patches, they often struggle to balance stealthiness with attack effectiveness and lack flexibility for user customization. To address these challenges, we propose a novel diffusion-based customizable patch generation framework termed DiffPatch, specifically tailored for creating naturalistic and customizable adversarial patches. Our approach enables users to utilize a reference image as the source, rather than starting from random noise, and incorporates masks to craft naturalistic patches of various shapes, not limited to squares. To prevent the original semantics from being lost during the diffusion process, we employ Null-text inversion to map random noise samples to a single input image and generate patches through Incomplete Diffusion Optimization (IDO). Notably, while maintaining a natural appearance, our method achieves a comparable attack performance to state-of-the-art non-naturalistic patches when using similarly sized attacks. Using DiffPatch, we have created a physical adversarial T-shirt dataset, AdvPatch-1K, specifically targeting YOLOv5s. This dataset includes over a thousand images across diverse scenarios, validating the effectiveness of our attack in real-world environments. Moreover, it provides a valuable resource for future research.

Via

Access Paper or Ask Questions

Self-reconfiguration Strategies for Space-distributed Spacecraft

Nov 26, 2024

Tianle Liu, Zhixiang Wang, Yongwei Zhang, Ziwei Wang, Zihao Liu, Yizhai Zhang, Panfeng Huang

Figure 1 for Self-reconfiguration Strategies for Space-distributed Spacecraft

Figure 2 for Self-reconfiguration Strategies for Space-distributed Spacecraft

Figure 3 for Self-reconfiguration Strategies for Space-distributed Spacecraft

Figure 4 for Self-reconfiguration Strategies for Space-distributed Spacecraft

Abstract:This paper proposes a distributed on-orbit spacecraft assembly algorithm, where future spacecraft can assemble modules with different functions on orbit to form a spacecraft structure with specific functions. This form of spacecraft organization has the advantages of reconfigurability, fast mission response and easy maintenance. Reasonable and efficient on-orbit self-reconfiguration algorithms play a crucial role in realizing the benefits of distributed spacecraft. This paper adopts the framework of imitation learning combined with reinforcement learning for strategy learning of module handling order. A robot arm motion algorithm is then designed to execute the handling sequence. We achieve the self-reconfiguration handling task by creating a map on the surface of the module, completing the path point planning of the robotic arm using A*. The joint planning of the robotic arm is then accomplished through forward and reverse kinematics. Finally, the results are presented in Unity3D.

Via

Access Paper or Ask Questions

AsynEIO: Asynchronous Monocular Event-Inertial Odometry Using Gaussian Process Regression

Nov 19, 2024

Zhixiang Wang, Xudong Li, Yizhai Zhang, Fan Zhang, Panfeng

Abstract:Event cameras, when combined with inertial sensors, show significant potential for motion estimation in challenging scenarios, such as high-speed maneuvers and low-light environments. There are many methods for producing such estimations, but most boil down to a synchronous discrete-time fusion problem. However, the asynchronous nature of event cameras and their unique fusion mechanism with inertial sensors remain underexplored. In this paper, we introduce a monocular event-inertial odometry method called AsynEIO, designed to fuse asynchronous event and inertial data within a unified Gaussian Process (GP) regression framework. Our approach incorporates an event-driven frontend that tracks feature trajectories directly from raw event streams at a high temporal resolution. These tracked feature trajectories, along with various inertial factors, are integrated into the same GP regression framework to enable asynchronous fusion. With deriving analytical residual Jacobians and noise models, our method constructs a factor graph that is iteratively optimized and pruned using a sliding-window optimizer. Comparative assessments highlight the performance of different inertial fusion strategies, suggesting optimal choices for varying conditions. Experimental results on both public datasets and our own event-inertial sequences indicate that AsynEIO outperforms existing methods, especially in high-speed and low-illumination scenarios.

* Submitted to IEEE (2024-11-4)

Via

Access Paper or Ask Questions

Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions

Aug 16, 2024

Chenming Tang, Zhixiang Wang, Yunfang Wu

Figure 1 for Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions

Figure 2 for Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions

Figure 3 for Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions

Figure 4 for Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions

Abstract:With the help of in-context learning (ICL), large language models (LLMs) have achieved impressive performance across various tasks. However, the function of descriptive instructions during ICL remains under-explored. In this work, we propose an ensemble prompt framework to describe the selection criteria of multiple in-context examples, and preliminary experiments on machine translation (MT) across six translation directions confirm that this framework boosts ICL perfromance. But to our surprise, LLMs might not necessarily care what the descriptions actually say, and the performance gain is primarily caused by the ensemble format, since the framework could lead to improvement even with random descriptive nouns. We further apply this new ensemble prompt on a range of commonsense, math, logical reasoning and hallucination tasks with three LLMs and achieve promising results, suggesting again that designing a proper prompt format would be much more effective and efficient than paying effort into specific descriptions. Our code will be publicly available once this paper is published.

* 10 pages, 6 figures, 3 tables

Via

Access Paper or Ask Questions