Abstract:Log parsing, a vital task for interpreting the vast and complex data produced within software architectures faces significant challenges in the transition from academic benchmarks to the industrial domain. Existing log parsers, while highly effective on standardized public datasets, struggle to maintain performance and efficiency when confronted with the sheer scale and diversity of real-world industrial logs. These challenges are two-fold: 1) massive log templates: The performance and efficiency of most existing parsers will be significantly reduced when logs of growing quantities and different lengths; 2) Complex and changeable semantics: Traditional template-matching algorithms cannot accurately match the log templates of complicated industrial logs because they cannot utilize cross-language logs with similar semantics. To address these issues, we propose ECLIPSE, Enhanced Cross-Lingual Industrial log Parsing with Semantic Entropy-LCS, since cross-language logs can robustly parse industrial logs. On the one hand, it integrates two efficient data-driven template-matching algorithms and Faiss indexing. On the other hand, driven by the powerful semantic understanding ability of the Large Language Model (LLM), the semantics of log keywords were accurately extracted, and the retrieval space was effectively reduced. It is worth noting that we launched a Chinese and English cross-platform industrial log parsing benchmark ECLIPSE-Bench to evaluate the performance of mainstream parsers in industrial scenarios. Our experimental results, conducted across public benchmarks and the proprietary ECLIPSE-Bench dataset, underscore the superior performance and robustness of our proposed ECLIPSE. Notably, ECLIPSE delivers state-of-the-art performance when compared to strong baselines on diverse datasets and preserves a significant edge in processing efficiency.
Abstract:Extremely large-scale antenna array (ELAA) is promising as one of the key ingredients for the sixth generation (6G) of wireless communications. The electromagnetic propagation of spherical wavefronts introduces an additional distance-dependent dimension beyond conventional beamspace. In this paper, we first present one concise closed-form channel formulation for extremely large-scale multiple-input multiple-output (XL-MIMO). All line-of-sight (LoS) and non-line-of-sight (NLoS) paths, far-field and near-field scenarios, and XL-MIMO and XL-MISO channels are unified under the framework, where additional Vandermonde windowing matrix is exclusively considered for LoS path. Under this framework, we further propose one low-complexity unified LoS/NLoS orthogonal matching pursuit (XL-UOMP) algorithm for XL-MIMO channel estimation. The simulation results demonstrate the superiority of the proposed algorithm on both estimation accuracy and pilot consumption.
Abstract:In spite of the rapid advancements in unsupervised log anomaly detection techniques, the current mainstream models still necessitate specific training for individual system datasets, resulting in costly procedures and limited scalability due to dataset size, thereby leading to performance bottlenecks. Furthermore, numerous models lack cognitive reasoning capabilities, posing challenges in direct transferability to similar systems for effective anomaly detection. Additionally, akin to reconstruction networks, these models often encounter the "identical shortcut" predicament, wherein the majority of system logs are classified as normal, erroneously predicting normal classes when confronted with rare anomaly logs due to reconstruction errors. To address the aforementioned issues, we propose MLAD, a novel anomaly detection model that incorporates semantic relational reasoning across multiple systems. Specifically, we employ Sentence-bert to capture the similarities between log sequences and convert them into highly-dimensional learnable semantic vectors. Subsequently, we revamp the formulas of the Attention layer to discern the significance of each keyword in the sequence and model the overall distribution of the multi-system dataset through appropriate vector space diffusion. Lastly, we employ a Gaussian mixture model to highlight the uncertainty of rare words pertaining to the "identical shortcut" problem, optimizing the vector space of the samples using the maximum expectation model. Experiments on three real-world datasets demonstrate the superiority of MLAD.
Abstract:Recently, significant progress has been made in text-based motion generation, enabling the generation of diverse and high-quality human motions that conform to textual descriptions. However, it remains challenging to generate fine-grained or stylized motions due to the lack of datasets annotated with detailed textual descriptions. By adopting a divide-and-conquer strategy, we propose a new framework named Fine-Grained Human Motion Diffusion Model (FG-MDM) for human motion generation. Specifically, we first parse previous vague textual annotation into fine-grained description of different body parts by leveraging a large language model (GPT-3.5). We then use these fine-grained descriptions to guide a transformer-based diffusion model. FG-MDM can generate fine-grained and stylized motions even outside of the distribution of the training data. Our experimental results demonstrate the superiority of FG-MDM over previous methods, especially the strong generalization capability. We will release our fine-grained textual annotations for HumanML3D and KIT.
Abstract:In orthogonal time frequency space (OTFS) systems, the impact of frequency-dependent Doppler which is referred to as the Doppler squint effect (DSE) is accumulated through longer duration, whose negligence has prevented OTFS systems from exploiting the performance superiority. In this paper, practical OFDM system using cyclic prefix time guard interval (CP-OFDM)-based OTFS systems with DSE are adopted. Cyclic prefix (CP) length is analyzed while the input-output relation considering DSE is derived. By deploying two prefix OFDM symbols, the channel estimation can be easily divided into three parts as delay detection, Doppler extraction and gain estimation. The linear equalization scheme is adopted taking the block diagonal property of the channel matrix into account, which completes the low-complexity receiver design. Simulation results confirm the significance of DSE and the considerable performance of the proposed low-complexity receiver scheme considering DSE.
Abstract:With the rapid development of IT operations, it has become increasingly crucial to efficiently manage and analyze large volumes of data for practical applications. The techniques of Natural Language Processing (NLP) have shown remarkable capabilities for various tasks, including named entity recognition, machine translation and dialogue systems. Recently, Large Language Models (LLMs) have achieved significant improvements across various NLP downstream tasks. However, there is a lack of specialized LLMs for IT operations. In this paper, we introduce the OWL, a large language model trained on our collected OWL-Instruct dataset with a wide range of IT-related information, where the mixture-of-adapter strategy is proposed to improve the parameter-efficient tuning across different domains or tasks. Furthermore, we evaluate the performance of our OWL on the OWL-Bench established by us and open IT-related benchmarks. OWL demonstrates superior performance results on IT tasks, which outperforms existing models by significant margins. Moreover, we hope that the findings of our work will provide more insights to revolutionize the techniques of IT operations with specialized LLMs.
Abstract:Extensive work has demonstrated the excellent performance of orthogonal time frequency space (OTFS) modulation in high-mobility scenarios. Time-variant wideband channel estimation serves as one of the key compositions of OTFS receivers since the data detection requires accurate channel state information (CSI). In practical wideband OTFS systems, the Doppler shift brought by the high mobility is frequency-dependent, which is referred to as the Doppler Squint Effect (DSE). Unfortunately, DSE was ignored in overall prior estimation schemes employed in OTFS systems, which leads to severe performance loss in channel estimation and the consequent data detection. In this paper, we investigate DSE of wideband time-variant channel in delay-Doppler domain and concentrate on the characterization of OTFS channel coefficients considering DSE. The formulation and evaluation of OTFS input-output relationship are provided for both ideal and rectangular waveforms considering DSE. The channel estimation is therefore formulated as a sparse signal recovery problem and an orthogonal matching pursuit (OMP)-based scheme is adopted to solve it. Simulation results confirm the significance of DSE and the performance superiority compared with traditional channel estimation approaches ignoring DSE.
Abstract:True-time-delay (TTD) lines are recently applied inside Terahertz (THz) hybrid-precoding transceiver to acquire high beamforming gain against beam squint effect. However, beam tracking turns into a challenging puzzle where enormous potential beam directions bring about unacceptable overhead consumption. Frequency-scanning-based beam tracking is initially explored but still imperfect in previous studies. In this paper, based on TTD-aided hybrid precoding structure, we give an enhanced frequency-scanning-based tracking scheme. Multiple beams are generated and utilized simultaneously via several subcarriers for tracking at one timeslot. The squint beams' angular coverage at all subcarriers can be flexibly controlled by two different subcarrier-angular mapping policies, named forward-pairing and backward-pairing. Then multiple physical directions can be simultaneously searched in one timeslot for lower overhead consumption. Besides, closed-form searching radius bound, parameter configuration and interferences are theoretically analyzed. Furthermore, we provide the coupled codebook design for TTDs and phase shifters (PSs), with joint consideration of both beamforming and tracking. Analytical and numerical results demonstrate the superiority of the new frequency-scanning-based tracking scheme and beamforming codebook.
Abstract:Extremely large-scale multiple-input multiple-output (XL-MIMO) promises to provide ultrahigh data rates in millimeter-wave (mmWave) and Terahertz (THz) spectrum. However, the spherical-wavefront wireless transmission caused by large aperture array presents huge challenges for channel state information (CSI) acquisition and beamforming. Two independent parameters (physical angles and transmission distance) should be simultaneously considered in XL-MIMO beamforming, which brings severe overhead consumption and beamforming degradation. To address this problem, we exploit the near-field channel characteristic and propose two low-overhead hierarchical beam training schemes for near-field XL-MIMO system. Firstly, we project near-field channel into spatial-angular domain and slope-intercept domain to capture detailed representations. Then we point out three critical criteria for XL-MIMO hierarchical beam training. Secondly, a novel spatial-chirp beam-aided codebook and corresponding hierarchical update policy are proposed. Thirdly, given the imperfect coverage and overlapping of spatial-chirp beams, we further design an enhanced hierarchical training codebook via manifold optimization and alternative minimization. Theoretical analyses and numerical simulations are also displayed to verify the superior performances on beamforming and training overhead.
Abstract:Reconfigurable intelligent surface (RIS) has been recognized as a potential technology for 5G beyond and attracted tremendous research attention. However, channel estimation in RIS-aided system is still a critical challenge due to the excessive amount of parameters in cascaded channel. The existing compressive sensing (CS)-based RIS estimation schemes only adopt incomplete sparsity, which induces redundant pilot consumption. In this paper, we exploit the specific triple-structured sparsity of the cascaded channel, i.e., the common column sparsity, structured row sparsity after offset compensation and the common offsets among all users. Then a novel multi-user joint estimation algorithm is proposed. Simulation results show that our approach can significantly reduce pilot overhead in both ULA and UPA scenarios.