Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Liang Yang

Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark

Oct 01, 2025

Minhui Zhu, Minyang Tian, Xiaocheng Yang, Tianci Zhou, Penghao Zhu, Eli Chertkov, Shengyan Liu, Yufeng Du, Lifan Yuan, Ziming Ji(+54 more)

Abstract:While large language models (LLMs) with reasoning capabilities are progressing rapidly on high-school math competitions and coding, can they reason effectively through complex, open-ended challenges found in frontier physics research? And crucially, what kinds of reasoning tasks do physicists want LLMs to assist with? To address these questions, we present the CritPt (Complex Research using Integrated Thinking - Physics Test, pronounced "critical point"), the first benchmark designed to test LLMs on unpublished, research-level reasoning tasks that broadly covers modern physics research areas, including condensed matter, quantum physics, atomic, molecular & optical physics, astrophysics, high energy physics, mathematical physics, statistical physics, nuclear physics, nonlinear dynamics, fluid dynamics and biophysics. CritPt consists of 71 composite research challenges designed to simulate full-scale research projects at the entry level, which are also decomposed to 190 simpler checkpoint tasks for more fine-grained insights. All problems are newly created by 50+ active physics researchers based on their own research. Every problem is hand-curated to admit a guess-resistant and machine-verifiable answer and is evaluated by an automated grading pipeline heavily customized for advanced physics-specific output formats. We find that while current state-of-the-art LLMs show early promise on isolated checkpoints, they remain far from being able to reliably solve full research-scale challenges: the best average accuracy among base models is only 4.0% , achieved by GPT-5 (high), moderately rising to around 10% when equipped with coding tools. Through the realistic yet standardized evaluation offered by CritPt, we highlight a large disconnect between current model capabilities and realistic physics research demands, offering a foundation to guide the development of scientifically grounded AI tools.

* 39 pages, 6 figures, 6 tables

Via

Access Paper or Ask Questions

Rethinking Contrastive Learning in Session-based Recommendation

Jun 05, 2025

Xiaokun Zhang, Bo Xu, Fenglong Ma, Zhizheng Wang, Liang Yang, Hongfei Lin

Figure 1 for Rethinking Contrastive Learning in Session-based Recommendation

Figure 2 for Rethinking Contrastive Learning in Session-based Recommendation

Figure 3 for Rethinking Contrastive Learning in Session-based Recommendation

Figure 4 for Rethinking Contrastive Learning in Session-based Recommendation

Abstract:Session-based recommendation aims to predict intents of anonymous users based on limited behaviors. With the ability in alleviating data sparsity, contrastive learning is prevailing in the task. However, we spot that existing contrastive learning based methods still suffer from three obstacles: (1) they overlook item-level sparsity and primarily focus on session-level sparsity; (2) they typically augment sessions using item IDs like crop, mask and reorder, failing to ensure the semantic consistency of augmented views; (3) they treat all positive-negative signals equally, without considering their varying utility. To this end, we propose a novel multi-modal adaptive contrastive learning framework called MACL for session-based recommendation. In MACL, a multi-modal augmentation is devised to generate semantically consistent views at both item and session levels by leveraging item multi-modal features. Besides, we present an adaptive contrastive loss that distinguishes varying contributions of positive-negative signals to improve self-supervised learning. Extensive experiments on three real-world datasets demonstrate the superiority of MACL over state-of-the-art methods.

* This work has been accepted by Pattern Recognition

Via

Access Paper or Ask Questions

TransMedSeg: A Transferable Semantic Framework for Semi-Supervised Medical Image Segmentation

May 20, 2025

Mengzhu Wang, Jiao Li, Shanshan Wang, Long Lan, Huibin Tan, Liang Yang, Guoli Yang

Abstract:Semi-supervised learning (SSL) has achieved significant progress in medical image segmentation (SSMIS) through effective utilization of limited labeled data. While current SSL methods for medical images predominantly rely on consistency regularization and pseudo-labeling, they often overlook transferable semantic relationships across different clinical domains and imaging modalities. To address this, we propose TransMedSeg, a novel transferable semantic framework for semi-supervised medical image segmentation. Our approach introduces a Transferable Semantic Augmentation (TSA) module, which implicitly enhances feature representations by aligning domain-invariant semantics through cross-domain distribution matching and intra-domain structural preservation. Specifically, TransMedSeg constructs a unified feature space where teacher network features are adaptively augmented towards student network semantics via a lightweight memory module, enabling implicit semantic transformation without explicit data generation. Interestingly, this augmentation is implicitly realized through an expected transferable cross-entropy loss computed over the augmented teacher distribution. An upper bound of the expected loss is theoretically derived and minimized during training, incurring negligible computational overhead. Extensive experiments on medical image datasets demonstrate that TransMedSeg outperforms existing semi-supervised methods, establishing a new direction for transferable representation learning in medical image analysis.

Via

Access Paper or Ask Questions

AI-empowered Channel Estimation for Block-based Active IRS-enhanced Hybrid-field IoT Network

May 20, 2025

Yan Wang, Feng Shu, Xianpeng Wang, Minghao Chen, Riqing Chen, Liang Yang, Junhui Zhao

Abstract:In this paper, channel estimation (CE) for uplink hybrid-field communications involving multiple Internet of Things (IoT) devices assisted by an active intelligent reflecting surface (IRS) is investigated. Firstly, to reduce the complexity of near-field (NF) channel modeling and estimation between IoT devices and active IRS, a sub-blocking strategy for active IRS is proposed. Specifically, the entire active IRS is divided into multiple smaller sub-blocks, so that IoT devices are located in the far-field (FF) region of each sub block, while also being located in the NF region of the entire active IRS. This strategy significantly simplifies the channel model and reduces the parameter estimation dimension by decoupling the high-dimensional NF channel parameter space into low dimensional FF sub channels. Subsequently, the relationship between channel approximation error and CE error with respect to the number of sub blocks is derived, and the optimal number of sub blocks is solved based on the criterion of minimizing the total error. In addition, considering that the amplification capability of active IRS requires power consumption, a closed-form expression for the optimal power allocation factor is derived. To further reduce the pilot overhead, a lightweight CE algorithm based on convolutional autoencoder (CAE) and multi-head attention mechanism, called CAEformer, is designed. The Cramer-Rao lower bound is derived to evaluate the proposed algorithm's performance. Finally, simulation results demonstrate the proposed CAEformer network significantly outperforms the conventional least square and minimum mean square error scheme in terms of estimation accuracy.

Via

Access Paper or Ask Questions

2DXformer: Dual Transformers for Wind Power Forecasting with Dual Exogenous Variables

May 02, 2025

Yajuan Zhang, Jiahai Jiang, Yule Yan, Liang Yang, Ping Zhang

Abstract:Accurate wind power forecasting can help formulate scientific dispatch plans, which is of great significance for maintaining the safety, stability, and efficient operation of the power system. In recent years, wind power forecasting methods based on deep learning have focused on extracting the spatiotemporal correlations among data, achieving significant improvements in forecasting accuracy. However, they exhibit two limitations. First, there is a lack of modeling for the inter-variable relationships, which limits the accuracy of the forecasts. Second, by treating endogenous and exogenous variables equally, it leads to unnecessary interactions between the endogenous and exogenous variables, increasing the complexity of the model. In this paper, we propose the 2DXformer, which, building upon the previous work's focus on spatiotemporal correlations, addresses the aforementioned two limitations. Specifically, we classify the inputs of the model into three types: exogenous static variables, exogenous dynamic variables, and endogenous variables. First, we embed these variables as variable tokens in a channel-independent manner. Then, we use the attention mechanism to capture the correlations among exogenous variables. Finally, we employ a multi-layer perceptron with residual connections to model the impact of exogenous variables on endogenous variables. Experimental results on two real-world large-scale datasets indicate that our proposed 2DXformer can further improve the performance of wind power forecasting. The code is available in this repository: \href{https://github.com/jseaj/2DXformer}{https://github.com/jseaj/2DXformer}.

* Accepted by ICDM 2024

Via

Access Paper or Ask Questions

Adaptive Decision Boundary for Few-Shot Class-Incremental Learning

Apr 15, 2025

Linhao Li, Yongzhang Tan, Siyuan Yang, Hao Cheng, Yongfeng Dong, Liang Yang

Figure 1 for Adaptive Decision Boundary for Few-Shot Class-Incremental Learning

Figure 2 for Adaptive Decision Boundary for Few-Shot Class-Incremental Learning

Figure 3 for Adaptive Decision Boundary for Few-Shot Class-Incremental Learning

Figure 4 for Adaptive Decision Boundary for Few-Shot Class-Incremental Learning

Abstract:Few-Shot Class-Incremental Learning (FSCIL) aims to continuously learn new classes from a limited set of training samples without forgetting knowledge of previously learned classes. Conventional FSCIL methods typically build a robust feature extractor during the base training session with abundant training samples and subsequently freeze this extractor, only fine-tuning the classifier in subsequent incremental phases. However, current strategies primarily focus on preventing catastrophic forgetting, considering only the relationship between novel and base classes, without paying attention to the specific decision spaces of each class. To address this challenge, we propose a plug-and-play Adaptive Decision Boundary Strategy (ADBS), which is compatible with most FSCIL methods. Specifically, we assign a specific decision boundary to each class and adaptively adjust these boundaries during training to optimally refine the decision spaces for the classes in each session. Furthermore, to amplify the distinctiveness between classes, we employ a novel inter-class constraint loss that optimizes the decision boundaries and prototypes for each class. Extensive experiments on three benchmarks, namely CIFAR100, miniImageNet, and CUB200, demonstrate that incorporating our ADBS method with existing FSCIL techniques significantly improves performance, achieving overall state-of-the-art results.

Via

Access Paper or Ask Questions

VisualQuest: A Diverse Image Dataset for Evaluating Visual Recognition in LLMs

Mar 25, 2025

Kelaiti Xiao, Liang Yang, Paerhati Tulajiang, Hongfei Lin

Abstract:This paper introduces VisualQuest, a novel image dataset designed to assess the ability of large language models (LLMs) to interpret non-traditional, stylized imagery. Unlike conventional photographic benchmarks, VisualQuest challenges models with images that incorporate abstract, symbolic, and metaphorical elements, requiring the integration of domain-specific knowledge and advanced reasoning. The dataset was meticulously curated through multiple stages of filtering, annotation, and standardization to ensure high quality and diversity. Our evaluations using several state-of-the-art multimodal LLMs reveal significant performance variations that underscore the importance of both factual background knowledge and inferential capabilities in visual recognition tasks. VisualQuest thus provides a robust and comprehensive benchmark for advancing research in multimodal reasoning and model architecture design.

Via

Access Paper or Ask Questions

DeskVision: Large Scale Desktop Region Captioning for Advanced GUI Agents

Mar 14, 2025

Yibin Xu, Liang Yang, Hao Chen, Hua Wang, Zhi Chen, Yaohua Tang

Figure 1 for DeskVision: Large Scale Desktop Region Captioning for Advanced GUI Agents

Figure 2 for DeskVision: Large Scale Desktop Region Captioning for Advanced GUI Agents

Figure 3 for DeskVision: Large Scale Desktop Region Captioning for Advanced GUI Agents

Figure 4 for DeskVision: Large Scale Desktop Region Captioning for Advanced GUI Agents

Abstract:The limitation of graphical user interface (GUI) data has been a significant barrier to the development of GUI agents today, especially for the desktop / computer use scenarios. To address this, we propose an automated GUI data generation pipeline, AutoCaptioner, which generates data with rich descriptions while minimizing human effort. Using AutoCaptioner, we created a novel large-scale desktop GUI dataset, DeskVision, along with the largest desktop test benchmark, DeskVision-Eval, which reflects daily usage and covers diverse systems and UI elements, each with rich descriptions. With DeskVision, we train a new GUI understanding model, GUIExplorer. Results show that GUIExplorer achieves state-of-the-art (SOTA) performance in understanding/grounding visual elements without the need for complex architectural designs. We further validated the effectiveness of the DeskVision dataset through ablation studies on various large visual language models (LVLMs). We believe that AutoCaptioner and DeskVision will significantly advance the development of GUI agents, and will open-source them for the community.

Via

Access Paper or Ask Questions

Generalizable AI-Generated Image Detection Based on Fractal Self-Similarity in the Spectrum

Mar 11, 2025

Shengpeng Xiao, Yuanfang Guo, Heqi Peng, Zeming Liu, Liang Yang, Yunhong Wang

Figure 1 for Generalizable AI-Generated Image Detection Based on Fractal Self-Similarity in the Spectrum

Figure 2 for Generalizable AI-Generated Image Detection Based on Fractal Self-Similarity in the Spectrum

Figure 3 for Generalizable AI-Generated Image Detection Based on Fractal Self-Similarity in the Spectrum

Figure 4 for Generalizable AI-Generated Image Detection Based on Fractal Self-Similarity in the Spectrum

Abstract:The generalization performance of AI-generated image detection remains a critical challenge. Although most existing methods perform well in detecting images from generative models included in the training set, their accuracy drops significantly when faced with images from unseen generators. To address this limitation, we propose a novel detection method based on the fractal self-similarity of the spectrum, a common feature among images generated by different models. Specifically, we demonstrate that AI-generated images exhibit fractal-like spectral growth through periodic extension and low-pass filtering. This observation motivates us to exploit the similarity among different fractal branches of the spectrum. Instead of directly analyzing the spectrum, our method mitigates the impact of varying spectral characteristics across different generators, improving detection performance for images from unseen models. Experiments on a public benchmark demonstrated the generalized detection performance across both GANs and diffusion models.

Via

Access Paper or Ask Questions

Secure Wireless-Powered zeRIS Communications

Mar 10, 2025

Jingyu Chen, Kunrui Cao, Panagiotis D. Diamantoulakis, Lu Lv, Liang Yang, Haolian Chi, Haiyang Ding

Figure 1 for Secure Wireless-Powered zeRIS Communications

Figure 2 for Secure Wireless-Powered zeRIS Communications

Figure 3 for Secure Wireless-Powered zeRIS Communications

Figure 4 for Secure Wireless-Powered zeRIS Communications

Abstract:This paper introduces the concept of wireless-powered zero-energy reconfigurable intelligent surface (zeRIS), and investigates a wireless-powered zeRIS aided communication system in terms of security, reliability and energy efficiency. In particular, we propose three new wireless-powered zeRIS modes: 1) in mode-I, N reconfigurable reflecting elements are adjusted to the optimal phase shift design of information user to maximize the reliability of the system; 2) in mode-II, N reconfigurable reflecting elements are adjusted to the optimal phase shift design of cooperative jamming user to maximize the security of the system; 3) in mode-III, N1 and N2 (N1+N2=N) reconfigurable reflecting elements are respectively adjusted to the optimal phase shift designs of information user and cooperative jamming user to balance the reliability and security of the system. Then, we propose three new metrics, i.e., joint outage probability (JOP), joint intercept probability (JIP), and secrecy energy efficiency (SEE), and analyze their closed-form expressions in three modes, respectively. The results show that under high transmission power, all the diversity gains of three modes are 1, and the JOPs of mode-I, mode-II and mode-III are improved by increasing the number of zeRIS elements, which are related to N2, N, and N^2_1, respectively. In addition, mode-I achieves the best JOP, while mode-II achieves the best JIP among three modes. We exploit two security-reliability trade-off (SRT) metrics, i.e., JOP versus JIP, and normalized joint intercept and outage probability (JIOP), to reveal the SRT performance of the proposed three modes. It is obtained that mode-II outperforms the other two modes in the JOP versus JIP, while mode-III and mode-II achieve the best performance of normalized JIOP at low and high transmission power, respectively.

* 13 pages, 7 figures

Via

Access Paper or Ask Questions