Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xu Jiang

FreshRetailNet-50K: A Stockout-Annotated Censored Demand Dataset for Latent Demand Recovery and Forecasting in Fresh Retail

May 22, 2025

Yangyang Wang, Jiawei Gu, Li Long, Xin Li, Li Shen, Zhouyu Fu, Xiangjun Zhou, Xu Jiang

Abstract:Accurate demand estimation is critical for the retail business in guiding the inventory and pricing policies of perishable products. However, it faces fundamental challenges from censored sales data during stockouts, where unobserved demand creates systemic policy biases. Existing datasets lack the temporal resolution and annotations needed to address this censoring effect. To fill this gap, we present FreshRetailNet-50K, the first large-scale benchmark for censored demand estimation. It comprises 50,000 store-product time series of detailed hourly sales data from 898 stores in 18 major cities, encompassing 863 perishable SKUs meticulously annotated for stockout events. The hourly stock status records unique to this dataset, combined with rich contextual covariates, including promotional discounts, precipitation, and temporal features, enable innovative research beyond existing solutions. We demonstrate one such use case of two-stage demand modeling: first, we reconstruct the latent demand during stockouts using precise hourly annotations. We then leverage the recovered demand to train robust demand forecasting models in the second stage. Experimental results show that this approach achieves a 2.73\% improvement in prediction accuracy while reducing the systematic demand underestimation from 7.37\% to near-zero bias. With unprecedented temporal granularity and comprehensive real-world information, FreshRetailNet-50K opens new research directions in demand imputation, perishable inventory optimization, and causal retail analytics. The unique annotation quality and scale of the dataset address long-standing limitations in retail AI, providing immediate solutions and a platform for future methodological innovation. The data (https://huggingface.co/datasets/Dingdong-Inc/FreshRetailNet-50K) and code (https://github.com/Dingdong-Inc/frn-50k-baseline}) are openly released.

* 10 pages, 5 figures

Via

Access Paper or Ask Questions

A Comprehensive Scatter Correction Model for Micro-Focus Dual-Source Imaging Systems: Combining Ambient, Cross, and Forward Scatter

Mar 18, 2025

Jianing Sun, Jigang Duan, Guangyin Li, Xu Jiang, Xing Zhao

Abstract:Compared to single-source imaging systems, dual-source imaging systems equipped with two cross-distributed scanning beams significantly enhance temporal resolution and capture more comprehensive object scanning information. Nevertheless, the interaction between the two scanning beams introduces more complex scatter signals into the acquired projection data. Existing methods typically model these scatter signals as the sum of cross-scatter and forward scatter, with cross-scatter estimation limited to single-scatter along primary paths. Through experimental measurements on our selfdeveloped micro-focus dual-source imaging system, we observed that the peak ratio of hardware-induced ambient scatter to single-source projection intensity can even exceed 60%, a factor often overlooked in conventional models. To address this limitation, we propose a more comprehensive model that decomposes the total scatter signals into three distinct components: ambient scatter, cross-scatter, and forward scatter. Furthermore, we introduce a cross-scatter kernel superposition (xSKS) module to enhance the accuracy of cross-scatter estimation by modeling both single and multiple crossscatter events along non-primary paths. Additionally, we employ a fast object-adaptive scatter kernel superposition (FOSKS) module for efficient forward scatter estimation. In Monte Carlo (MC) simulation experiments performed on a custom-designed waterbone phantom, our model demonstrated remarkable superiority, achieving a scatter-toprimary-weighted mean absolute percentage error (SPMAPE) of 1.32%, significantly lower than the 12.99% attained by the state-of-the-art method. Physical experiments further validate the superior performance of our model in correcting scatter artifacts.

Via

Access Paper or Ask Questions

Multi-Agent Image Restoration

Mar 12, 2025

Xu Jiang, Gehui Li, Bin Chen, Jian Zhang

Abstract:Image restoration (IR) is challenging due to the complexity of real-world degradations. While many specialized and all-in-one IR models have been developed, they fail to effectively handle complex, mixed degradations. Recent agentic methods RestoreAgent and AgenticIR leverage intelligent, autonomous workflows to alleviate this issue, yet they suffer from suboptimal results and inefficiency due to their resource-intensive finetunings, and ineffective searches and tool execution trials for satisfactory outputs. In this paper, we propose MAIR, a novel Multi-Agent approach for complex IR problems. We introduce a real-world degradation prior, categorizing degradations into three types: (1) scene, (2) imaging, and (3) compression, which are observed to occur sequentially in real world, and reverse them in the opposite order. Built upon this three-stage restoration framework, MAIR emulates a team of collaborative human specialists, including a "scheduler" for overall planning and multiple "experts" dedicated to specific degradations. This design minimizes search space and trial efforts, improving image quality while reducing inference costs. In addition, a registry mechanism is introduced to enable easy integration of new tools. Experiments on both synthetic and real-world datasets show that proposed MAIR achieves competitive performance and improved efficiency over the previous agentic IR system. Code and models will be made available.

Via

Access Paper or Ask Questions

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

Jan 21, 2025

Yujia Qin, Yining Ye, Junjie Fang, Haoming Wang, Shihao Liang, Shizuo Tian, Junda Zhang, Jiahao Li, Yunxin Li, Shijue Huang(+25 more)

Abstract:This paper introduces UI-TARS, a native GUI agent model that solely perceives the screenshots as input and performs human-like interactions (e.g., keyboard and mouse operations). Unlike prevailing agent frameworks that depend on heavily wrapped commercial models (e.g., GPT-4o) with expert-crafted prompts and workflows, UI-TARS is an end-to-end model that outperforms these sophisticated frameworks. Experiments demonstrate its superior performance: UI-TARS achieves SOTA performance in 10+ GUI agent benchmarks evaluating perception, grounding, and GUI task execution. Notably, in the OSWorld benchmark, UI-TARS achieves scores of 24.6 with 50 steps and 22.7 with 15 steps, outperforming Claude (22.0 and 14.9 respectively). In AndroidWorld, UI-TARS achieves 46.6, surpassing GPT-4o (34.5). UI-TARS incorporates several key innovations: (1) Enhanced Perception: leveraging a large-scale dataset of GUI screenshots for context-aware understanding of UI elements and precise captioning; (2) Unified Action Modeling, which standardizes actions into a unified space across platforms and achieves precise grounding and interaction through large-scale action traces; (3) System-2 Reasoning, which incorporates deliberate reasoning into multi-step decision making, involving multiple reasoning patterns such as task decomposition, reflection thinking, milestone recognition, etc. (4) Iterative Training with Reflective Online Traces, which addresses the data bottleneck by automatically collecting, filtering, and reflectively refining new interaction traces on hundreds of virtual machines. Through iterative training and reflection tuning, UI-TARS continuously learns from its mistakes and adapts to unforeseen situations with minimal human intervention. We also analyze the evolution path of GUI agents to guide the further development of this domain.

Via

Access Paper or Ask Questions

RealOSR: Latent Unfolding Boosting Diffusion-based Real-world Omnidirectional Image Super-Resolution

Dec 11, 2024

Xuhan Sheng, Runyi Li, Bin Chen, Weiqi Li, Xu Jiang, Jian Zhang

Figure 1 for RealOSR: Latent Unfolding Boosting Diffusion-based Real-world Omnidirectional Image Super-Resolution

Figure 2 for RealOSR: Latent Unfolding Boosting Diffusion-based Real-world Omnidirectional Image Super-Resolution

Figure 3 for RealOSR: Latent Unfolding Boosting Diffusion-based Real-world Omnidirectional Image Super-Resolution

Figure 4 for RealOSR: Latent Unfolding Boosting Diffusion-based Real-world Omnidirectional Image Super-Resolution

Abstract:Omnidirectional image super-resolution (ODISR) aims to upscale low-resolution (LR) omnidirectional images (ODIs) to high-resolution (HR), addressing the growing demand for detailed visual content across a $180^{\circ}\times360^{\circ}$ viewport. Existing methods are limited by simple degradation assumptions (e.g., bicubic downsampling), which fail to capture the complex, unknown real-world degradation processes. Recent diffusion-based approaches suffer from slow inference due to their hundreds of sampling steps and frequent pixel-latent space conversions. To tackle these challenges, in this paper, we propose RealOSR, a novel diffusion-based approach for real-world ODISR (Real-ODISR) with single-step diffusion denoising. To sufficiently exploit the input information, RealOSR introduces a lightweight domain alignment module, which facilitates the efficient injection of LR ODI into the single-step latent denoising. Additionally, to better utilize the rich semantic and multi-scale feature modeling ability of denoising UNet, we develop a latent unfolding module that simulates the gradient descent process directly in latent space. Experimental results demonstrate that RealOSR outperforms previous methods in both ODI recovery quality and efficiency. Compared to the recent state-of-the-art diffusion-based ODISR method, OmniSSR, RealOSR achieves significant improvements in visual quality and over \textbf{200$\times$} inference acceleration. Our code and models will be released.

Via

Access Paper or Ask Questions

First performance of hybrid spectra CT reconstruction: a general Spectrum-Model-Aided Reconstruction Technique (SMART)

Oct 24, 2024

Huiying Pan, Jianing Sun, Xu Jiang, Xing Zhao

Abstract:Hybrid spectral CT integrates energy integrating detectors (EID) and photon counting detectors (PCD) into a single system, combining the large field-of-view advantage of EID with the high energy and spatial resolution of PCD. This represents a new research direction in spectral CT imaging. However, the different imaging principles and inconsistent geometric paths of the two detectors make it difficult to reconstruct images using data from hybrid detectors. In addition, the quality reconstructed images considering spectrum is affected by the accuracy of spectral estimation and the scattered photons. In this work, Firstly, we propose a general hybrid spectral reconstruction method that takes into account both the spectral CT imaging principles of the two different detectors and the influence of scattered photons in the forward process modelling. Furthermore, we also apply volume fraction constraints to the results reconstructed from the two detector data. By alternately solving the spectral estimation and the spectral image reconstruction by the ADMM method, the estimated spectra and the reconstructed images reinforce each other, thus improving the accuracy of the spectral estimation and the quality of the reconstructed images. The proposed method is the first to achieve hybrid spectral CT reconstruction for both detectors, allowing simultaneous recovery of spectrum and image reconstruction from hybrid spectral data containing scattering. In addition, the method is also applicable to spectral CT imaging using a single type of detector. We validated the effectiveness of the proposed method through numerical experiments and successfully performed the first hybrid spectral CT reconstruction experiment on our self-developed hybrid spectral CT system.

Via

Access Paper or Ask Questions

An Instrumented Wheel-On-Limb System of Planetary Rovers for Wheel-Terrain Interactions: System Conception and Preliminary Design

Apr 06, 2022

Lihang Feng, Xu Jiang, Aiguo Song

Figure 1 for An Instrumented Wheel-On-Limb System of Planetary Rovers for Wheel-Terrain Interactions: System Conception and Preliminary Design

Figure 2 for An Instrumented Wheel-On-Limb System of Planetary Rovers for Wheel-Terrain Interactions: System Conception and Preliminary Design

Figure 3 for An Instrumented Wheel-On-Limb System of Planetary Rovers for Wheel-Terrain Interactions: System Conception and Preliminary Design

Figure 4 for An Instrumented Wheel-On-Limb System of Planetary Rovers for Wheel-Terrain Interactions: System Conception and Preliminary Design

Abstract:Understanding the wheel-terrain interaction is of great importance to improve the maneuverability and traversability of the rovers. A well-developed sensing device carried by the rover would greatly facilitate the complex risk-reducing operations on sandy terrains. In this paper, an instrumented wheel-on-limb (WOL) system of planetary rovers for wheel-terrain interaction characterization is presented. Assuming the function of a passive suspension of the wheel, the WOL system allows itself to follow the terrain contour, and keep the wheel remain lowered onto the ground during rover motion including climbing and descending, as well as deploy and place the wheel on the ground before a drive commanding. The system concept, functional requirements, and pre-design work, as well as the system integration are presented.

* 2nd International Conference on Robotics and Control Engineering, ACM RobCE 2022, March 25, 2022, Nanjing, China

Via

Access Paper or Ask Questions

Training language models to follow instructions with human feedback

Mar 04, 2022

Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray(+10 more)

Figure 1 for Training language models to follow instructions with human feedback

Figure 2 for Training language models to follow instructions with human feedback

Figure 3 for Training language models to follow instructions with human feedback

Figure 4 for Training language models to follow instructions with human feedback

Abstract:Making language models bigger does not inherently make them better at following a user's intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users. In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback. Starting with a set of labeler-written prompts and prompts submitted through the OpenAI API, we collect a dataset of labeler demonstrations of the desired model behavior, which we use to fine-tune GPT-3 using supervised learning. We then collect a dataset of rankings of model outputs, which we use to further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT. In human evaluations on our prompt distribution, outputs from the 1.3B parameter InstructGPT model are preferred to outputs from the 175B GPT-3, despite having 100x fewer parameters. Moreover, InstructGPT models show improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets. Even though InstructGPT still makes simple mistakes, our results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent.

Via

Access Paper or Ask Questions

WebGPT: Browser-assisted question-answering with human feedback

Dec 17, 2021

Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders(+8 more)

Figure 1 for WebGPT: Browser-assisted question-answering with human feedback

Figure 2 for WebGPT: Browser-assisted question-answering with human feedback

Figure 3 for WebGPT: Browser-assisted question-answering with human feedback

Figure 4 for WebGPT: Browser-assisted question-answering with human feedback

Abstract:We fine-tune GPT-3 to answer long-form questions using a text-based web-browsing environment, which allows the model to search and navigate the web. By setting up the task so that it can be performed by humans, we are able to train models on the task using imitation learning, and then optimize answer quality with human feedback. To make human evaluation of factual accuracy easier, models must collect references while browsing in support of their answers. We train and evaluate our models on ELI5, a dataset of questions asked by Reddit users. Our best model is obtained by fine-tuning GPT-3 using behavior cloning, and then performing rejection sampling against a reward model trained to predict human preferences. This model's answers are preferred by humans 56% of the time to those of our human demonstrators, and 69% of the time to the highest-voted answer from Reddit.

* 30 pages

Via

Access Paper or Ask Questions

Auto-weighted Multi-view Feature Selection with Graph Optimization

Apr 11, 2021

Qi Wang, Xu Jiang, Mulin Chen, Xuelong Li

Figure 1 for Auto-weighted Multi-view Feature Selection with Graph Optimization

Figure 2 for Auto-weighted Multi-view Feature Selection with Graph Optimization

Figure 3 for Auto-weighted Multi-view Feature Selection with Graph Optimization

Figure 4 for Auto-weighted Multi-view Feature Selection with Graph Optimization

Abstract:In this paper, we focus on the unsupervised multi-view feature selection which tries to handle high dimensional data in the field of multi-view learning. Although some graph-based methods have achieved satisfactory performance, they ignore the underlying data structure across different views. Besides, their pre-defined laplacian graphs are sensitive to the noises in the original data space, and fail to get the optimal neighbor assignment. To address the above problems, we propose a novel unsupervised multi-view feature selection model based on graph learning, and the contributions are threefold: (1) during the feature selection procedure, the consensus similarity graph shared by different views is learned. Therefore, the proposed model can reveal the data relationship from the feature subset. (2) a reasonable rank constraint is added to optimize the similarity matrix to obtain more accurate information; (3) an auto-weighted framework is presented to assign view weights adaptively, and an effective alternative iterative algorithm is proposed to optimize the problem. Experiments on various datasets demonstrate the superiority of the proposed method compared with the state-of-the-art methods.

Via

Access Paper or Ask Questions