Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zexi Chen

Attention Mechanism and Context Modeling System for Text Mining Machine Translation

Aug 08, 2024

Shi Bo, Yuwei Zhang, Junming Huang, Sitong Liu, Zexi Chen, Zizheng Li

Figure 1 for Attention Mechanism and Context Modeling System for Text Mining Machine Translation

Figure 2 for Attention Mechanism and Context Modeling System for Text Mining Machine Translation

Figure 3 for Attention Mechanism and Context Modeling System for Text Mining Machine Translation

Figure 4 for Attention Mechanism and Context Modeling System for Text Mining Machine Translation

Abstract:This paper advances a novel architectural schema anchored upon the Transformer paradigm and innovatively amalgamates the K-means categorization algorithm to augment the contextual apprehension capabilities of the schema. The transformer model performs well in machine translation tasks due to its parallel computing power and multi-head attention mechanism. However, it may encounter contextual ambiguity or ignore local features when dealing with highly complex language structures. To circumvent this constraint, this exposition incorporates the K-Means algorithm, which is used to stratify the lexis and idioms of the input textual matter, thereby facilitating superior identification and preservation of the local structure and contextual intelligence of the language. The advantage of this combination is that K-Means can automatically discover the topic or concept regions in the text, which may be directly related to translation quality. Consequently, the schema contrived herein enlists K-Means as a preparatory phase antecedent to the Transformer and recalibrates the multi-head attention weights to assist in the discrimination of lexis and idioms bearing analogous semantics or functionalities. This ensures the schema accords heightened regard to the contextual intelligence embodied by these clusters during the training phase, rather than merely focusing on locational intelligence.

Via

Access Paper or Ask Questions

Research on Feature Extraction Data Processing System For MRI of Brain Diseases Based on Computer Deep Learning

Jun 23, 2024

Lingxi Xiao, Jinxin Hu, Yutian Yang, Yinqiu Feng, Zichao Li, Zexi Chen

Abstract:Most of the existing wavelet image processing techniques are carried out in the form of single-scale reconstruction and multiple iterations. However, processing high-quality fMRI data presents problems such as mixed noise and excessive computation time. This project proposes the use of matrix operations by combining mixed noise elimination methods with wavelet analysis to replace traditional iterative algorithms. Functional magnetic resonance imaging (fMRI) of the auditory cortex of a single subject is analyzed and compared to the wavelet domain signal processing technology based on repeated times and the world's most influential SPM8. Experiments show that this algorithm is the fastest in computing time, and its detection effect is comparable to the traditional iterative algorithm. However, this has a higher practical value for the processing of FMRI data. In addition, the wavelet analysis method proposed signal processing to speed up the calculation rate.

Via

Access Paper or Ask Questions

Research on Disease Prediction Model Construction Based on Computer AI deep Learning Technology

Jun 23, 2024

Yang Lin, Muqing Li, Ziyi Zhu, Yinqiu Feng, Lingxi Xiao, Zexi Chen

Abstract:The prediction of disease risk factors can screen vulnerable groups for effective prevention and treatment, so as to reduce their morbidity and mortality. Machine learning has a great demand for high-quality labeling information, and labeling noise in medical big data poses a great challenge to efficient disease risk warning methods. Therefore, this project intends to study the robust learning algorithm and apply it to the early warning of infectious disease risk. A dynamic truncated loss model is proposed, which combines the traditional mutual entropy implicit weight feature with the mean variation feature. It is robust to label noise. A lower bound on training loss is constructed, and a method based on sampling rate is proposed to reduce the gradient of suspected samples to reduce the influence of noise on training results. The effectiveness of this method under different types of noise was verified by using a stroke screening data set as an example. This method enables robust learning of data containing label noise.

Via

Access Paper or Ask Questions

Exploration of Attention Mechanism-Enhanced Deep Learning Models in the Mining of Medical Textual Data

May 23, 2024

Lingxi Xiao, Muqing Li, Yinqiu Feng, Meiqi Wang, Ziyi Zhu, Zexi Chen

Figure 1 for Exploration of Attention Mechanism-Enhanced Deep Learning Models in the Mining of Medical Textual Data

Figure 2 for Exploration of Attention Mechanism-Enhanced Deep Learning Models in the Mining of Medical Textual Data

Figure 3 for Exploration of Attention Mechanism-Enhanced Deep Learning Models in the Mining of Medical Textual Data

Figure 4 for Exploration of Attention Mechanism-Enhanced Deep Learning Models in the Mining of Medical Textual Data

Abstract:The research explores the utilization of a deep learning model employing an attention mechanism in medical text mining. It targets the challenge of analyzing unstructured text information within medical data. This research seeks to enhance the model's capability to identify essential medical information by incorporating deep learning and attention mechanisms. This paper reviews the basic principles and typical model architecture of attention mechanisms and shows the effectiveness of their application in the tasks of disease prediction, drug side effect monitoring, and entity relationship extraction. Aiming at the particularity of medical texts, an adaptive attention model integrating domain knowledge is proposed, and its ability to understand medical terms and process complex contexts is optimized. The experiment verifies the model's effectiveness in improving task accuracy and robustness, especially when dealing with long text. The future research path of enhancing model interpretation, realizing cross-domain knowledge transfer, and adapting to low-resource scenarios is discussed in the research outlook, which provides a new perspective and method support for intelligent medical information processing and clinical decision assistance. Finally, cross-domain knowledge transfer and adaptation strategies for low-resource scenarios, providing theoretical basis and technical reference for promoting the development of intelligent medical information processing and clinical decision support systems.

* arXiv admin note: text overlap with arXiv:2405.11704 by other authors

Via

Access Paper or Ask Questions

Leveraging Large Language Models in Conversational Recommender Systems

May 16, 2023

Luke Friedman, Sameer Ahuja, David Allen, Zhenning Tan, Hakim Sidahmed, Changbo Long, Jun Xie, Gabriel Schubiner, Ajay Patel, Harsh Lara(+3 more)

Figure 1 for Leveraging Large Language Models in Conversational Recommender Systems

Figure 2 for Leveraging Large Language Models in Conversational Recommender Systems

Figure 3 for Leveraging Large Language Models in Conversational Recommender Systems

Figure 4 for Leveraging Large Language Models in Conversational Recommender Systems

Abstract:A Conversational Recommender System (CRS) offers increased transparency and control to users by enabling them to engage with the system through a real-time multi-turn dialogue. Recently, Large Language Models (LLMs) have exhibited an unprecedented ability to converse naturally and incorporate world knowledge and common-sense reasoning into language understanding, unlocking the potential of this paradigm. However, effectively leveraging LLMs within a CRS introduces new technical challenges, including properly understanding and controlling a complex conversation and retrieving from external sources of information. These issues are exacerbated by a large, evolving item corpus and a lack of conversational data for training. In this paper, we provide a roadmap for building an end-to-end large-scale CRS using LLMs. In particular, we propose new implementations for user preference understanding, flexible dialogue management and explainable recommendations as part of an integrated architecture powered by LLMs. For improved personalization, we describe how an LLM can consume interpretable natural language user profiles and use them to modulate session-level context. To overcome conversational data limitations in the absence of an existing production CRS, we propose techniques for building a controllable LLM-based user simulator to generate synthetic conversations. As a proof of concept we introduce RecLLM, a large-scale CRS for YouTube videos built on LaMDA, and demonstrate its fluency and diverse functionality through some illustrative example conversations.

Via

Access Paper or Ask Questions

DPCN++: Differentiable Phase Correlation Network for Versatile Pose Registration

Jun 12, 2022

Zexi Chen, Yiyi Liao, Haozhe Du, Haodong Zhang, Xuecheng Xu, Haojian Lu, Rong Xiong, Yue Wang

Figure 1 for DPCN++: Differentiable Phase Correlation Network for Versatile Pose Registration

Figure 2 for DPCN++: Differentiable Phase Correlation Network for Versatile Pose Registration

Figure 3 for DPCN++: Differentiable Phase Correlation Network for Versatile Pose Registration

Figure 4 for DPCN++: Differentiable Phase Correlation Network for Versatile Pose Registration

Abstract:Pose registration is critical in vision and robotics. This paper focuses on the challenging task of initialization-free pose registration up to 7DoF for homogeneous and heterogeneous measurements. While recent learning-based methods show promise using differentiable solvers, they either rely on heuristically defined correspondences or are prone to local minima. We present a differentiable phase correlation (DPC) solver that is globally convergent and correspondence-free. When combined with simple feature extraction networks, our general framework DPCN++ allows for versatile pose registration with arbitrary initialization. Specifically, the feature extraction networks first learn dense feature grids from a pair of homogeneous/heterogeneous measurements. These feature grids are then transformed into a translation and scale invariant spectrum representation based on Fourier transform and spherical radial aggregation, decoupling translation and scale from rotation. Next, the rotation, scale, and translation are independently and efficiently estimated in the spectrum step-by-step using the DPC solver. The entire pipeline is differentiable and trained end-to-end. We evaluate DCPN++ on a wide range of registration tasks taking different input modalities, including 2D bird's-eye view images, 3D object and scene measurements, and medical images. Experimental results demonstrate that DCPN++ outperforms both classical and learning-based baselines, especially on partially observed and heterogeneous measurements.

Via

Access Paper or Ask Questions

Least Square Estimation Network for Depth Completion

Mar 07, 2022

Xianze Fang, Zexi Chen, Yunkai Wang, Yue Wang, Rong Xiong

Figure 1 for Least Square Estimation Network for Depth Completion

Figure 2 for Least Square Estimation Network for Depth Completion

Figure 3 for Least Square Estimation Network for Depth Completion

Figure 4 for Least Square Estimation Network for Depth Completion

Abstract:Depth completion is a fundamental task in computer vision and robotics research. Many previous works complete the dense depth map with neural networks directly but most of them are non-interpretable and can not generalize to different situations well. In this paper, we propose an effective image representation method for depth completion tasks. The input of our system is a monocular camera frame and the synchronous sparse depth map. The output of our system is a dense per-pixel depth map of the frame. First we use a neural network to transform each pixel into a feature vector, which we call base functions. Then we pick out the known pixels' base functions and their depth values. We use a linear least square algorithm to fit the base functions and the depth values. Then we get the weights estimated from the least square algorithm. Finally, we apply the weights to the whole image and predict the final depth map. Our method is interpretable so it can generalize well. Experiments show that our results beat the state-of-the-art on NYU-Depth-V2 dataset both in accuracy and runtime. Moreover, experiments show that our method can generalize well on different numbers of sparse points and different datasets.

Via

Access Paper or Ask Questions

Fully Differentiable and Interpretable Model for VIO with 4 Trainable Parameters

Sep 25, 2021

Zexi Chen, Haozhe Du, Yiyi Liao, Yue Wang, Rong Xiong

Figure 1 for Fully Differentiable and Interpretable Model for VIO with 4 Trainable Parameters

Figure 2 for Fully Differentiable and Interpretable Model for VIO with 4 Trainable Parameters

Figure 3 for Fully Differentiable and Interpretable Model for VIO with 4 Trainable Parameters

Figure 4 for Fully Differentiable and Interpretable Model for VIO with 4 Trainable Parameters

Abstract:Monocular visual-inertial odometry (VIO) is a critical problem in robotics and autonomous driving. Traditional methods solve this problem based on filtering or optimization. While being fully interpretable, they rely on manual interference and empirical parameter tuning. On the other hand, learning-based approaches allow for end-to-end training but require a large number of training data to learn millions of parameters. However, the non-interpretable and heavy models hinder the generalization ability. In this paper, we propose a fully differentiable, interpretable, and lightweight monocular VIO model that contains only 4 trainable parameters. Specifically, we first adopt Unscented Kalman Filter as a differentiable layer to predict the pitch and roll, where the covariance matrices of noise are learned to filter out the noise of the IMU raw data. Second, the refined pitch and roll are adopted to retrieve a gravity-aligned BEV image of each frame using differentiable camera projection. Finally, a differentiable pose estimator is utilized to estimate the remaining 4 DoF poses between the BEV frames. Our method allows for learning the covariance matrices end-to-end supervised by the pose estimation loss, demonstrating superior performance to empirical baselines. Experimental results on synthetic and real-world datasets demonstrate that our simple approach is competitive with state-of-the-art methods and generalizes well on unseen scenes.

Via

Access Paper or Ask Questions

Domain Generalization for Vision-based Driving Trajectory Generation

Sep 22, 2021

Yunkai Wang, Dongkun Zhang, Yuxiang Cui, Zexi Chen, Wei Jing, Junbo Chen, Rong Xiong, Yue Wang

Figure 1 for Domain Generalization for Vision-based Driving Trajectory Generation

Figure 2 for Domain Generalization for Vision-based Driving Trajectory Generation

Figure 3 for Domain Generalization for Vision-based Driving Trajectory Generation

Figure 4 for Domain Generalization for Vision-based Driving Trajectory Generation

Abstract:One of the challenges in vision-based driving trajectory generation is dealing with out-of-distribution scenarios. In this paper, we propose a domain generalization method for vision-based driving trajectory generation for autonomous vehicles in urban environments, which can be seen as a solution to extend the Invariant Risk Minimization (IRM) method in complex problems. We leverage an adversarial learning approach to train a trajectory generator as the decoder. Based on the pre-trained decoder, we infer the latent variables corresponding to the trajectories, and pre-train the encoder by regressing the inferred latent variable. Finally, we fix the decoder but fine-tune the encoder with the final trajectory loss. We compare our proposed method with the state-of-the-art trajectory generation method and some recent domain generalization methods on both datasets and simulation, demonstrating that our method has better generalization ability.

Via

Access Paper or Ask Questions

Human-Robot Motion Retargeting via Neural Latent Optimization

Mar 16, 2021

Haodong Zhang, Weijie Li, Yuwei Liang, Zexi Chen, Yuxiang Cui, Yue Wang, Rong Xiong

Figure 1 for Human-Robot Motion Retargeting via Neural Latent Optimization

Figure 2 for Human-Robot Motion Retargeting via Neural Latent Optimization

Figure 3 for Human-Robot Motion Retargeting via Neural Latent Optimization

Figure 4 for Human-Robot Motion Retargeting via Neural Latent Optimization

Abstract:Motion retargeting from human to robot remains a very challenging task due to variations in the structure of humans and robots. Most traditional optimization-based algorithms solve this problem by minimizing an objective function, which is usually time-consuming and heavily dependent on good initialization. In contrast, methods with feedforward neural networks can learn prior knowledge from training data and quickly infer the results, but these methods also suffer from the generalization problem on unseen actions, leading to some infeasible results. In this paper, we propose a novel neural optimization approach taking advantages of both kinds of methods. A graph-based neural network is utilized to establish a mapping between the latent space and the robot motion space. Afterward, the retargeting results can be obtained by searching for the optimal vector in this latent space. In addition, a deep encoder also provides a promising initialization for better and faster convergence. We perform experiments on retargeting Chinese sign language to three different kinds of robots in the simulation environment, including ABB's YuMi dual-arm collaborative robot, NAO and Pepper. A real-world experiment is also conducted on the Yumi robot. Experimental results show that our method can retarget motion from human to robot with both efficiency and accuracy.

* Submitted to IROS2021

Via

Access Paper or Ask Questions