Faculty of Computer Science and Engineering, Xi'an University of Technology, Xi'an, China
Abstract:Gradient descent (GD) and stochastic gradient descent (SGD) have been widely used in a large number of application domains. Therefore, understanding the dynamics of GD and improving its convergence speed is still of great importance. This paper carefully analyzes the dynamics of GD based on the terminal attractor at different stages of its gradient flow. On the basis of the terminal sliding mode theory and the terminal attractor theory, four adaptive learning rates are designed. Their performances are investigated in light of a detailed theoretical investigation, and the running times of the learning procedures are evaluated and compared. The total times of their learning processes are also studied in detail. To evaluate their effectiveness, various simulation results are investigated on a function approximation problem and an image classification problem.
Abstract:Vision-and-Language Navigation (VLN) is a challenging task that requires a robot to navigate in photo-realistic environments with human natural language promptings. Recent studies aim to handle this task by constructing the semantic spatial map representation of the environment, and then leveraging the strong ability of reasoning in large language models for generalizing code for guiding the robot navigation. However, these methods face limitations in instance-level and attribute-level navigation tasks as they cannot distinguish different instances of the same object. To address this challenge, we propose a new method, namely, Instance-aware Visual Language Map (IVLMap), to empower the robot with instance-level and attribute-level semantic mapping, where it is autonomously constructed by fusing the RGBD video data collected from the robot agent with special-designed natural language map indexing in the bird's-in-eye view. Such indexing is instance-level and attribute-level. In particular, when integrated with a large language model, IVLMap demonstrates the capability to i) transform natural language into navigation targets with instance and attribute information, enabling precise localization, and ii) accomplish zero-shot end-to-end navigation tasks based on natural language commands. Extensive navigation experiments are conducted. Simulation results illustrate that our method can achieve an average improvement of 14.4\% in navigation accuracy. Code and demo are released at https://ivlmap.github.io/.
Abstract:Rapid signal fluctuations due to blockage effects cause excessive handovers (HOs) and degrade mobility performance. By reconfiguring line-of-sight (LoS) Links through passive reflections, intelligent reflective surface (IRS) has the potential to address this issue. Due to the lack of introducing blocking effects, existing HO analyses cannot capture excessive HOs or exploit enhancements via IRSs. This paper proposes an LoS state transition model enabling analysis of mobility enhancement achieved by IRS-reconfigured LoS links, where LoS link blocking and reconfiguration utilizing IRS during user movement are explicitly modeled as stochastic processes. Specifically, the condition for blocking LoS links is characterized as a set of possible blockage locations, the distribution of available IRSs is thinned by the criteria for reconfiguring LoS links. In addition, BSs potentially handed over are categorized by probabilities of LoS states to enable HO decision analysis. By projecting distinct gains of LoS states onto a uniform equivalent distance criterion, mobility enhanced by IRS is quantified through the compact expression of HO probability. Results show the probability of dropping into non-LoS decreases by 70% when deploying IRSs with the density of 93/km$^2$, and HOs decrease by 67% under the optimal IRS distributed deployment parameter.
Abstract:Owning to the reflection gain and double path loss featured by intelligent reflecting surface (IRS) channels, handover (HO) locations become irregular and the signal strength fluctuates sharply with variations in IRS connections during HO, the risk of HO failures (HOFs) is exacerbated and thus HO parameters require reconfiguration. However, existing HO models only assume monotonic negative exponential path loss and cannot obtain sound HO parameters. This paper proposes a discrete-time model to explicitly track the HO process with variations in IRS connections, where IRS connections and HO process are discretized as finite states by measurement intervals, and transitions between states are modeled as stochastic processes. Specifically, to capture signal fluctuations during HO, IRS connection state-dependent distributions of the user-IRS distance are modified by the correlation between measurement intervals. In addition, states of the HO process are formed with Time-to-Trigger and HO margin whose transition probabilities are integrated concerning all IRS connection states. Trigger location distributions and probabilities of HO, HOF, and ping-pong (PP) are obtained by tracing user HO states. Results show IRSs mitigate PPs by 48% but exacerbate HOFs by 90% under regular parameters. Optimal parameters are mined ensuring probabilities of HOF and PP are both less than 0.1%.
Abstract:In High-definition (HD) maps, lane elements constitute the majority of components and demand stringent localization requirements to ensure safe vehicle navigation. Vision lane detection with LiDAR position assignment is a prevalent method to acquire initial lanes for HD maps. However, due to incorrect vision detection and coarse camera-LiDAR calibration, initial lanes may deviate from their true positions within an uncertain range. To mitigate the need for manual lane correction, we propose a patch-wise lane correction network (PLCNet) to automatically correct the positions of initial lane points in local LiDAR images that are transformed from point clouds. PLCNet first extracts multi-scale image features and crops patch (ROI) features centered at each initial lane point. By applying ROIAlign, the fix-sized ROI features are flattened into 1D features. Then, a 1D lane attention module is devised to compute instance-level lane features with adaptive weights. Finally, lane correction offsets are inferred by a multi-layer perceptron and used to correct the initial lane positions. Considering practical applications, our automatic method supports merging local corrected lanes into global corrected lanes. Through extensive experiments on a self-built dataset, we demonstrate that PLCNet achieves fast and effective initial lane correction.
Abstract:In the pharmaceutical industry, the use of artificial intelligence (AI) has seen consistent growth over the past decade. This rise is attributed to major advancements in statistical machine learning methodologies, computational capabilities and the increased availability of large datasets. AI techniques are applied throughout different stages of drug development, ranging from drug discovery to post-marketing benefit-risk assessment. Kolluri et al. provided a review of several case studies that span these stages, featuring key applications such as protein structure prediction, success probability estimation, subgroup identification, and AI-assisted clinical trial monitoring. From a regulatory standpoint, there was a notable uptick in submissions incorporating AI components in 2021. The most prevalent therapeutic areas leveraging AI were oncology (27%), psychiatry (15%), gastroenterology (12%), and neurology (11%). The paradigm of personalized or precision medicine has gained significant traction in recent research, partly due to advancements in AI techniques \cite{hamburg2010path}. This shift has had a transformative impact on the pharmaceutical industry. Departing from the traditional "one-size-fits-all" model, personalized medicine incorporates various individual factors, such as environmental conditions, lifestyle choices, and health histories, to formulate customized treatment plans. By utilizing sophisticated machine learning algorithms, clinicians and researchers are better equipped to make informed decisions in areas such as disease prevention, diagnosis, and treatment selection, thereby optimizing health outcomes for each individual.
Abstract:Peak sidelobe level reduction (PSLR) is crucial in the application of large-scale array antenna, which directly determines the radiation performance of array antenna. We study the PSLR of subarray level aperiodic arrays and propose three array structures: dislocated subarrays with uniform elements (DSUE), uniform subarrays with random elements (USRE), dislocated subarrays with random elements (DSRE). To optimize the dislocation position of subarrays and random position of elements, the improved Bat algorithm (IBA) is applied. To draw the comparison of PSLR effect among these three array structures, we take three size of array antennas from small to large as examples to simulate and calculate the redundancy and peak sidelobe level (PSLL) of them. The results show that DSRE is the optimal array structure by analyzing the dislocation distance of subarray, scanning angle and applicable frequency. The proposed design method is a universal and scalable method, which is of great application value to the design of large-scale aperiodic array antenna.
Abstract:Automated image processing algorithms can improve the quality, efficiency, and consistency of classifying the morphology of heterogeneous carbonate rock and can deal with a massive amount of data and images seamlessly. Geoscientists face difficulties in setting the direction of the optimum method for determining petrophysical properties from rock images, Micro-Computed Tomography (uCT), or Magnetic Resonance Imaging (MRI). Most of the successful work is from the homogeneous rocks focusing on 2D images with less focus on 3D and requiring numerical simulation. Currently, image analysis methods converge to three approaches: image processing, artificial intelligence, and combined image processing with artificial intelligence. In this work, we propose two methods to determine the porosity from 3D uCT and MRI images: an image processing method with Image Resolution Optimized Gaussian Algorithm (IROGA); advanced image recognition method enabled by Machine Learning Difference of Gaussian Random Forest (MLDGRF). We have built reference 3D micro models and collected images for calibration of IROGA and MLDGRF methods. To evaluate the predictive capability of these calibrated approaches, we ran them on 3D uCT and MRI images of natural heterogeneous carbonate rock. We measured the porosity and lithology of the carbonate rock using three and two industry-standard ways, respectively, as reference values. Notably, IROGA and MLDGRF have produced porosity results with an accuracy of 96.2% and 97.1% on the training set and 91.7% and 94.4% on blind test validation, respectively, in comparison with the three experimental measurements. We measured limestone and pyrite reference values using two methods, X-ray powder diffraction, and grain density measurements. MLDGRF has produced lithology (limestone and Pyrite) volumes with 97.7% accuracy.