Abstract:Evaluating log summarization systems is challenging due to the lack of high-quality reference summaries and the limitations of existing metrics like ROUGE and BLEU, which depend on surface-level lexical overlap. We introduce REFLEX, a reference-free evaluation metric for log summarization based on large language model (LLM) judgment. REFLEX uses LLMs as zero-shot evaluators to assess summary quality along dimensions such as relevance, informativeness, and coherence, without requiring gold-standard references or human annotations. We show that REFLEX produces stable, interpretable, and fine-grained evaluations across multiple log summarization dataset, and more effectively distinguishes model outputs than traditional metrics. REFLEX provides a scalable alternative for evaluating log summaries in real-world settings where reference data is scarce or unavailable.
Abstract:In recent years, learned image compression methods have demonstrated superior rate-distortion performance compared to traditional image compression methods. Recent methods utilize convolutional neural networks (CNN), variational autoencoders (VAE), invertible neural networks (INN), and transformers. Despite their significant contributions, a main drawback of these models is their poor performance in capturing local redundancy. Therefore, to leverage global features along with local redundancy, we propose a CNN-based solution integrated with a feature encoding module. The feature encoding module encodes important features before feeding them to the CNN and then utilizes cross-scale window-based attention, which further captures local redundancy. Cross-scale window-based attention is inspired by the attention mechanism in transformers and effectively enlarges the receptive field. Both the feature encoding module and the cross-scale window-based attention module in our architecture are flexible and can be incorporated into any other network architecture. We evaluate our method on the Kodak and CLIC datasets and demonstrate that our approach is effective and on par with state-of-the-art methods.
Abstract:Recent development of large language models (LLMs), such as ChatGPT has been widely applied to a wide range of software engineering tasks. Many papers have reported their analysis on the potential advantages and limitations of ChatGPT for writing code, summarization, text generation, etc. However, the analysis of the current state of ChatGPT for log processing has received little attention. Logs generated by large-scale software systems are complex and hard to understand. Despite their complexity, they provide crucial information for subject matter experts to understand the system status and diagnose problems of the systems. In this paper, we investigate the current capabilities of ChatGPT to perform several interesting tasks on log data, while also trying to identify its main shortcomings. Our findings show that the performance of the current version of ChatGPT for log processing is limited, with a lack of consistency in responses and scalability issues. We also outline our views on how we perceive the role of LLMs in the log processing discipline and possible next steps to improve the current capabilities of ChatGPT and the future LLMs in this area. We believe our work can contribute to future academic research to address the identified issues.