Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Li Wu

Hierarchical Context Transformer for Multi-level Semantic Scene Understanding

Feb 21, 2025

Luoying Hao, Yan Hu, Yang Yue, Li Wu, Huazhu Fu, Jinming Duan, Jiang Liu

Abstract:A comprehensive and explicit understanding of surgical scenes plays a vital role in developing context-aware computer-assisted systems in the operating theatre. However, few works provide systematical analysis to enable hierarchical surgical scene understanding. In this work, we propose to represent the tasks set [phase recognition --> step recognition --> action and instrument detection] as multi-level semantic scene understanding (MSSU). For this target, we propose a novel hierarchical context transformer (HCT) network and thoroughly explore the relations across the different level tasks. Specifically, a hierarchical relation aggregation module (HRAM) is designed to concurrently relate entries inside multi-level interaction information and then augment task-specific features. To further boost the representation learning of the different tasks, inter-task contrastive learning (ICL) is presented to guide the model to learn task-wise features via absorbing complementary information from other tasks. Furthermore, considering the computational costs of the transformer, we propose HCT+ to integrate the spatial and temporal adapter to access competitive performance on substantially fewer tunable parameters. Extensive experiments on our cataract dataset and a publicly available endoscopic PSI-AVA dataset demonstrate the outstanding performance of our method, consistently exceeding the state-of-the-art methods by a large margin. The code is available at https://github.com/Aurora-hao/HCT.

* This paper has been accepted by the IEEE TCSVT

Via

Access Paper or Ask Questions

UmambaTSF: A U-shaped Multi-Scale Long-Term Time Series Forecasting Method Using Mamba

Oct 15, 2024

Li Wu, Wenbin Pei, Jiulong Jiao, Qiang Zhang

Figure 1 for UmambaTSF: A U-shaped Multi-Scale Long-Term Time Series Forecasting Method Using Mamba

Figure 2 for UmambaTSF: A U-shaped Multi-Scale Long-Term Time Series Forecasting Method Using Mamba

Figure 3 for UmambaTSF: A U-shaped Multi-Scale Long-Term Time Series Forecasting Method Using Mamba

Figure 4 for UmambaTSF: A U-shaped Multi-Scale Long-Term Time Series Forecasting Method Using Mamba

Abstract:Multivariate Time series forecasting is crucial in domains such as transportation, meteorology, and finance, especially for predicting extreme weather events. State-of-the-art methods predominantly rely on Transformer architectures, which utilize attention mechanisms to capture temporal dependencies. However, these methods are hindered by quadratic time complexity, limiting the model's scalability with respect to input sequence length. This significantly restricts their practicality in the real world. Mamba, based on state space models (SSM), provides a solution with linear time complexity, increasing the potential for efficient forecasting of sequential data. In this study, we propose UmambaTSF, a novel long-term time series forecasting framework that integrates multi-scale feature extraction capabilities of U-shaped encoder-decoder multilayer perceptrons (MLP) with Mamba's long sequence representation. To improve performance and efficiency, the Mamba blocks introduced in the framework adopt a refined residual structure and adaptable design, enabling the capture of unique temporal signals and flexible channel processing. In the experiments, UmambaTSF achieves state-of-the-art performance and excellent generality on widely used benchmark datasets while maintaining linear time complexity and low memory consumption.

Via

Access Paper or Ask Questions

ARFA: An Asymmetric Receptive Field Autoencoder Model for Spatiotemporal Prediction

Sep 01, 2023

Wenxuan Zhang, Xuechao Zou, Li Wu, Jianqiang Huang, Xiaoying Wang

Abstract:Spatiotemporal prediction aims to generate future sequences by paradigms learned from historical contexts. It holds significant importance in numerous domains, including traffic flow prediction and weather forecasting. However, existing methods face challenges in handling spatiotemporal correlations, as they commonly adopt encoder and decoder architectures with identical receptive fields, which adversely affects prediction accuracy. This paper proposes an Asymmetric Receptive Field Autoencoder (ARFA) model to address this issue. Specifically, we design corresponding sizes of receptive field modules tailored to the distinct functionalities of the encoder and decoder. In the encoder, we introduce a large kernel module for global spatiotemporal feature extraction. In the decoder, we develop a small kernel module for local spatiotemporal information reconstruction. To address the scarcity of meteorological prediction data, we constructed the RainBench, a large-scale radar echo dataset specific to the unique precipitation characteristics of inland regions in China for precipitation prediction. Experimental results demonstrate that ARFA achieves consistent state-of-the-art performance on two mainstream spatiotemporal prediction datasets and our RainBench dataset, affirming the effectiveness of our approach. This work not only explores a novel method from the perspective of receptive fields but also provides data support for precipitation prediction, thereby advancing future research in spatiotemporal prediction.

* 0 pages, 5 figures

Via

Access Paper or Ask Questions

Multi-view information fusion using multi-view variational autoencoders to predict proximal femoral strength

Oct 03, 2022

Chen Zhao, Joyce H Keyak, Xuewei Cao, Qiuying Sha, Li Wu, Zhe Luo, Lanjuan Zhao, Qing Tian, Chuan Qiu, Ray Su(+3 more)

Figure 1 for Multi-view information fusion using multi-view variational autoencoders to predict proximal femoral strength

Figure 2 for Multi-view information fusion using multi-view variational autoencoders to predict proximal femoral strength

Figure 3 for Multi-view information fusion using multi-view variational autoencoders to predict proximal femoral strength

Figure 4 for Multi-view information fusion using multi-view variational autoencoders to predict proximal femoral strength

Abstract:Background and aim: Hip fracture can be devastating. The proximal femoral strength can be computed by subject-specific finite element (FE) analysis (FEA) using quantitative CT images. The aim of this paper is to design a deep learning-based model for hip fracture prediction with multi-view information fusion. Method: We developed a multi-view variational autoencoder (MMVAE) for feature representation learning and designed the product of expert model (PoE) for multi-view information fusion.We performed genome-wide association studies (GWAS) to select the most relevant genetic features with proximal femoral strengths and integrated genetic features with DXA-derived imaging features and clinical variables for proximal femoral strength prediction. Results: The designed model achieved the mean absolute percentage error of 0.2050,0.0739 and 0.0852 for linear fall, nonlinear fall and nonlinear stance fracture load prediction, respectively. For linear fall and nonlinear stance fracture load prediction, integrating genetic and DXA-derived imaging features were beneficial; while for nonlinear fall fracture load prediction, integrating genetic features, DXA-derived imaging features as well as clinical variables, the model achieved the best performance. Conclusion: The proposed model is capable of predicting proximal femoral strengths using genetic features, DXA-derived imaging features as well as clinical variables. Compared to performing FEA using QCT images to calculate proximal femoral strengths, the presented method is time-efficient and cost effective, and radiation dosage is limited. From the technique perspective, the final models can be applied to other multi-view information integration tasks.

* 16 pages, 3 figures

Via

Access Paper or Ask Questions

Failure Identification from Unstable Log Data using Deep Learning

Apr 06, 2022

Jasmin Bogatinovski, Sasho Nedelkoski, Li Wu, Jorge Cardoso, Odej Kao

Figure 1 for Failure Identification from Unstable Log Data using Deep Learning

Figure 2 for Failure Identification from Unstable Log Data using Deep Learning

Figure 3 for Failure Identification from Unstable Log Data using Deep Learning

Figure 4 for Failure Identification from Unstable Log Data using Deep Learning

Abstract:The reliability of cloud platforms is of significant relevance because society increasingly relies on complex software systems running on the cloud. To improve it, cloud providers are automating various maintenance tasks, with failure identification frequently being considered. The precondition for automation is the availability of observability tools, with system logs commonly being used. The focus of this paper is log-based failure identification. This problem is challenging because of the instability of the log data and the incompleteness of the explicit logging failure coverage within the code. To address the two challenges, we present CLog as a method for failure identification. The key idea presented herein based is on our observation that by representing the log data as sequences of subprocesses instead of sequences of log events, the effect of the unstable log data is reduced. CLog introduces a novel subprocess extraction method that uses context-aware neural network and clustering methods to extract meaningful subprocesses. The direct modeling of log event contexts allows the identification of failures with respect to the abrupt context changes, addressing the challenge of insufficient logging failure coverage. Our experimental results demonstrate that the learned subprocesses representations reduce the instability in the input, allowing CLog to outperform the baselines on the failure identification subproblems - 1) failure detection by 9-24% on F1 score and 2) failure type identification by 7% on the macro averaged F1 score. Further analysis shows the existent negative correlation between the instability in the input event sequences and the detection performance in a model-agnostic manner.

* This paper is accepted for publication at IEEE CCGrid 2022. For fairest citation, please use the original proceedings credentials

Via

Access Paper or Ask Questions