Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yu Xiao

Zero-shot Load Forecasting for Integrated Energy Systems: A Large Language Model-based Framework with Multi-task Learning

Feb 24, 2025

Jiaheng Li, Donghe Li, Ye Yang, Huan Xi, Yu Xiao, Li Sun, Dou An, Qingyu Yang

Abstract:The growing penetration of renewable energy sources in power systems has increased the complexity and uncertainty of load forecasting, especially for integrated energy systems with multiple energy carriers. Traditional forecasting methods heavily rely on historical data and exhibit limited transferability across different scenarios, posing significant challenges for emerging applications in smart grids and energy internet. This paper proposes the TSLLM-Load Forecasting Mechanism, a novel zero-shot load forecasting framework based on large language models (LLMs) to address these challenges. The framework consists of three key components: a data preprocessing module that handles multi-source energy load data, a time series prompt generation module that bridges the semantic gap between energy data and LLMs through multi-task learning and similarity alignment, and a prediction module that leverages pre-trained LLMs for accurate forecasting. The framework's effectiveness was validated on a real-world dataset comprising load profiles from 20 Australian solar-powered households, demonstrating superior performance in both conventional and zero-shot scenarios. In conventional testing, our method achieved a Mean Squared Error (MSE) of 0.4163 and a Mean Absolute Error (MAE) of 0.3760, outperforming existing approaches by at least 8\%. In zero-shot prediction experiments across 19 households, the framework maintained consistent accuracy with a total MSE of 11.2712 and MAE of 7.6709, showing at least 12\% improvement over current methods. The results validate the framework's potential for accurate and transferable load forecasting in integrated energy systems, particularly beneficial for renewable energy integration and smart grid applications.

Via

Access Paper or Ask Questions

Exploring the Inquiry-Diagnosis Relationship with Advanced Patient Simulators

Jan 16, 2025

Zhaocheng Liu, Quan Tu, Wen Ye, Yu Xiao, Zhishou Zhang, Hengfu Cui, Yalun Zhu, Qiang Ju, Shizheng Li, Jian Xie

Abstract:Online medical consultation (OMC) restricts doctors to gathering patient information solely through inquiries, making the already complex sequential decision-making process of diagnosis even more challenging. Recently, the rapid advancement of large language models has demonstrated a significant potential to transform OMC. However, most studies have primarily focused on improving diagnostic accuracy under conditions of relatively sufficient information, while paying limited attention to the "inquiry" phase of the consultation process. This lack of focus has left the relationship between "inquiry" and "diagnosis" insufficiently explored. In this paper, we first extract real patient interaction strategies from authentic doctor-patient conversations and use these strategies to guide the training of a patient simulator that closely mirrors real-world behavior. By inputting medical records into our patient simulator to simulate patient responses, we conduct extensive experiments to explore the relationship between "inquiry" and "diagnosis" in the consultation process. Experimental results demonstrate that inquiry and diagnosis adhere to the Liebig's law: poor inquiry quality limits the effectiveness of diagnosis, regardless of diagnostic capability, and vice versa. Furthermore, the experiments reveal significant differences in the inquiry performance of various models. To investigate this phenomenon, we categorize the inquiry process into four types: (1) chief complaint inquiry; (2) specification of known symptoms; (3) inquiry about accompanying symptoms; and (4) gathering family or medical history. We analyze the distribution of inquiries across the four types for different models to explore the reasons behind their significant performance differences. We plan to open-source the weights and related code of our patient simulator at https://github.com/LIO-H-ZEN/PatientSimulator.

Via

Access Paper or Ask Questions

Transformer-Based Approaches for Sensor-Based Human Activity Recognition: Opportunities and Challenges

Oct 17, 2024

Clayton Souza Leite, Henry Mauranen, Aziza Zhanabatyrova, Yu Xiao

Figure 1 for Transformer-Based Approaches for Sensor-Based Human Activity Recognition: Opportunities and Challenges

Figure 2 for Transformer-Based Approaches for Sensor-Based Human Activity Recognition: Opportunities and Challenges

Figure 3 for Transformer-Based Approaches for Sensor-Based Human Activity Recognition: Opportunities and Challenges

Figure 4 for Transformer-Based Approaches for Sensor-Based Human Activity Recognition: Opportunities and Challenges

Abstract:Transformers have excelled in natural language processing and computer vision, paving their way to sensor-based Human Activity Recognition (HAR). Previous studies show that transformers outperform their counterparts exclusively when they harness abundant data or employ compute-intensive optimization algorithms. However, neither of these scenarios is viable in sensor-based HAR due to the scarcity of data in this field and the frequent need to perform training and inference on resource-constrained devices. Our extensive investigation into various implementations of transformer-based versus non-transformer-based HAR using wearable sensors, encompassing more than 500 experiments, corroborates these concerns. We observe that transformer-based solutions pose higher computational demands, consistently yield inferior performance, and experience significant performance degradation when quantized to accommodate resource-constrained devices. Additionally, transformers demonstrate lower robustness to adversarial attacks, posing a potential threat to user trust in HAR.

Via

Access Paper or Ask Questions

Enhancing Motion Variation in Text-to-Motion Models via Pose and Video Conditioned Editing

Oct 11, 2024

Clayton Leite, Yu Xiao

Figure 1 for Enhancing Motion Variation in Text-to-Motion Models via Pose and Video Conditioned Editing

Figure 2 for Enhancing Motion Variation in Text-to-Motion Models via Pose and Video Conditioned Editing

Figure 3 for Enhancing Motion Variation in Text-to-Motion Models via Pose and Video Conditioned Editing

Figure 4 for Enhancing Motion Variation in Text-to-Motion Models via Pose and Video Conditioned Editing

Abstract:Text-to-motion models that generate sequences of human poses from textual descriptions are garnering significant attention. However, due to data scarcity, the range of motions these models can produce is still limited. For instance, current text-to-motion models cannot generate a motion of kicking a football with the instep of the foot, since the training data only includes martial arts kicks. We propose a novel method that uses short video clips or images as conditions to modify existing basic motions. In this approach, the model's understanding of a kick serves as the prior, while the video or image of a football kick acts as the posterior, enabling the generation of the desired motion. By incorporating these additional modalities as conditions, our method can create motions not present in the training set, overcoming the limitations of text-motion datasets. A user study with 26 participants demonstrated that our approach produces unseen motions with realism comparable to commonly represented motions in text-motion datasets (e.g., HumanML3D), such as walking, running, squatting, and kicking.

Via

Access Paper or Ask Questions

Cooperative Path Planning with Asynchronous Multiagent Reinforcement Learning

Sep 01, 2024

Jiaming Yin, Weixiong Rao, Yu Xiao, Keshuang Tang

Figure 1 for Cooperative Path Planning with Asynchronous Multiagent Reinforcement Learning

Figure 2 for Cooperative Path Planning with Asynchronous Multiagent Reinforcement Learning

Figure 3 for Cooperative Path Planning with Asynchronous Multiagent Reinforcement Learning

Figure 4 for Cooperative Path Planning with Asynchronous Multiagent Reinforcement Learning

Abstract:In this paper, we study the shortest path problem (SPP) with multiple source-destination pairs (MSD), namely MSD-SPP, to minimize average travel time of all shortest paths. The inherent traffic capacity limits within a road network contributes to the competition among vehicles. Multi-agent reinforcement learning (MARL) model cannot offer effective and efficient path planning cooperation due to the asynchronous decision making setting in MSD-SPP, where vehicles (a.k.a agents) cannot simultaneously complete routing actions in the previous time step. To tackle the efficiency issue, we propose to divide an entire road network into multiple sub-graphs and subsequently execute a two-stage process of inter-region and intra-region route planning. To address the asynchronous issue, in the proposed asyn-MARL framework, we first design a global state, which exploits a low-dimensional vector to implicitly represent the joint observations and actions of multi-agents. Then we develop a novel trajectory collection mechanism to decrease the redundancy in training trajectories. Additionally, we design a novel actor network to facilitate the cooperation among vehicles towards the same or close destinations and a reachability graph aimed at preventing infinite loops in routing paths. On both synthetic and real road networks, our evaluation result demonstrates that our approach outperforms state-of-the-art planning approaches.

Via

Access Paper or Ask Questions

Beyond the Snapshot: Brain Tokenized Graph Transformer for Longitudinal Brain Functional Connectome Embedding

Jul 13, 2023

Zijian Dong, Yilei Wu, Yu Xiao, Joanna Su Xian Chong, Yueming Jin, Juan Helen Zhou

Abstract:Under the framework of network-based neurodegeneration, brain functional connectome (FC)-based Graph Neural Networks (GNN) have emerged as a valuable tool for the diagnosis and prognosis of neurodegenerative diseases such as Alzheimer's disease (AD). However, these models are tailored for brain FC at a single time point instead of characterizing FC trajectory. Discerning how FC evolves with disease progression, particularly at the predementia stages such as cognitively normal individuals with amyloid deposition or individuals with mild cognitive impairment (MCI), is crucial for delineating disease spreading patterns and developing effective strategies to slow down or even halt disease advancement. In this work, we proposed the first interpretable framework for brain FC trajectory embedding with application to neurodegenerative disease diagnosis and prognosis, namely Brain Tokenized Graph Transformer (Brain TokenGT). It consists of two modules: 1) Graph Invariant and Variant Embedding (GIVE) for generation of node and spatio-temporal edge embeddings, which were tokenized for downstream processing; 2) Brain Informed Graph Transformer Readout (BIGTR) which augments previous tokens with trainable type identifiers and non-trainable node identifiers and feeds them into a standard transformer encoder to readout. We conducted extensive experiments on two public longitudinal fMRI datasets of the AD continuum for three tasks, including differentiating MCI from controls, predicting dementia conversion in MCI, and classification of amyloid positive or negative cognitively normal individuals. Based on brain FC trajectory, the proposed Brain TokenGT approach outperformed all the other benchmark models and at the same time provided excellent interpretability. The code is available at https://github.com/ZijianD/Brain-TokenGT.git

* MICCAI 2023

Via

Access Paper or Ask Questions

A Subabdominal MRI Image Segmentation Algorithm Based on Multi-Scale Feature Pyramid Network and Dual Attention Mechanism

May 19, 2023

Yu Xiao, Xin Yang, Sijuan Huang, Yongkai Liu, Shuqin Chen, Lihua Guo

Abstract:This study aimed to solve the semantic gap and misalignment issue between encoding and decoding because of multiple convolutional and pooling operations in U-Net when segmenting subabdominal MRI images during rectal cancer treatment. A MRI Image Segmentation is proposed based on a multi-scale feature pyramid network and dual attention mechanism. Our innovation is the design of two modules: 1) a dilated convolution and multi-scale feature pyramid network are used in the encoding to avoid the semantic gap. 2) a dual attention mechanism is designed to maintain spatial information of U-Net and reduce misalignment. Experiments on a subabdominal MRI image dataset show the proposed method achieves better performance than others methods. In conclusion, a multi-scale feature pyramid network can reduce the semantic gap, and the dual attention mechanism can make an alignment of features between encoding and decoding.

* 19 pages,9 figures

Via

Access Paper or Ask Questions

Motley: Benchmarking Heterogeneity and Personalization in Federated Learning

Jun 18, 2022

Shanshan Wu, Tian Li, Zachary Charles, Yu Xiao, Ziyu Liu, Zheng Xu, Virginia Smith

Figure 1 for Motley: Benchmarking Heterogeneity and Personalization in Federated Learning

Figure 2 for Motley: Benchmarking Heterogeneity and Personalization in Federated Learning

Figure 3 for Motley: Benchmarking Heterogeneity and Personalization in Federated Learning

Figure 4 for Motley: Benchmarking Heterogeneity and Personalization in Federated Learning

Abstract:Personalized federated learning considers learning models unique to each client in a heterogeneous network. The resulting client-specific models have been purported to improve metrics such as accuracy, fairness, and robustness in federated networks. However, despite a plethora of work in this area, it remains unclear: (1) which personalization techniques are most effective in various settings, and (2) how important personalization truly is for realistic federated applications. To better answer these questions, we propose Motley, a benchmark for personalized federated learning. Motley consists of a suite of cross-device and cross-silo federated datasets from varied problem domains, as well as thorough evaluation metrics for better understanding the possible impacts of personalization. We establish baselines on the benchmark by comparing a number of representative personalized federated learning methods. These initial results highlight strengths and weaknesses of existing approaches, and raise several open questions for the community. Motley aims to provide a reproducible means with which to advance developments in personalized and heterogeneity-aware federated learning, as well as the related areas of transfer learning, meta-learning, and multi-task learning.

* 35 pages, 9 figures, 5 tables. Code: https://github.com/google-research/federated/tree/master/personalization_benchmark

Via

Access Paper or Ask Questions

Automatic Map Update Using Dashcam Videos

Sep 24, 2021

Aziza Zhanabatyrova, Clayton Souza Leite, Yu Xiao

Figure 1 for Automatic Map Update Using Dashcam Videos

Figure 2 for Automatic Map Update Using Dashcam Videos

Figure 3 for Automatic Map Update Using Dashcam Videos

Figure 4 for Automatic Map Update Using Dashcam Videos

Abstract:Autonomous driving requires 3D maps that provide accurate and up-to-date information about semantic landmarks. Due to the wider availability and lower cost of cameras compared with laser scanners, vision-based mapping has attracted much attention from academia and industry. Among the existing solutions, Structure-from-Motion (SfM) technology has proved to be feasible for building 3D maps from crowdsourced data, since it allows unordered images as input. Previous works on SfM have mainly focused on issues related to building 3D point clouds and calculating camera poses, leaving the issues of automatic change detection and localization open. We propose in this paper an SfM-based solution for automatic map update, with a focus on real-time change detection and localization. Our solution builds on comparison of semantic map data (e.g. types and locations of traffic signs). Through a novel design of the pixel-wise 3D localization algorithm, our system can locate the objects detected from 2D images in a 3D space, utilizing sparse SfM point clouds. Experiments with dashcam videos collected from two urban areas prove that the system is able to locate visible traffic signs in front along the driving direction with a median distance error of 1.52 meters. Moreover, it can detect up to 80\% of the changes with a median distance error of 2.21 meters. The result analysis also shows the potential of significantly improving the system performance in the future by increasing the accuracy of the background technology in use, including in particularly the object detection and point cloud geo-registration algorithms.

Via

Access Paper or Ask Questions

Learning-based decentralized offloading decision making in an adversarial environment

Apr 26, 2021

Byungjin Cho, Yu Xiao

Figure 1 for Learning-based decentralized offloading decision making in an adversarial environment

Figure 2 for Learning-based decentralized offloading decision making in an adversarial environment

Figure 3 for Learning-based decentralized offloading decision making in an adversarial environment

Figure 4 for Learning-based decentralized offloading decision making in an adversarial environment

Abstract:Vehicular fog computing (VFC) pushes the cloud computing capability to the distributed fog nodes at the edge of the Internet, enabling compute-intensive and latency-sensitive computing services for vehicles through task offloading. However, a heterogeneous mobility environment introduces uncertainties in terms of resource supply and demand, which are inevitable bottlenecks for the optimal offloading decision. Also, these uncertainties bring extra challenges to task offloading under the oblivious adversary attack and data privacy risks. In this article, we develop a new adversarial online algorithm with bandit feedback based on the adversarial multi-armed bandit theory, to enable scalable and low-complex offloading decision making on the fog node selection toward minimizing the offloading service cost in terms of delay and energy. The key is to implicitly tune exploration bonus in selection and assessment rules of the designed algorithm, taking into account volatile resource supply and demand. We theoretically prove that the input-size dependent selection rule allows to choose a suitable fog node without exploring the sub-optimal actions, and also an appropriate score patching rule allows to quickly adapt to evolving circumstances, which reduces variance and bias simultaneously, thereby achieving better exploitation exploration balance. Simulation results verify the effectiveness and robustness of the proposed algorithm.

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions