Abstract:Recent large language models (LLMs) have demonstrated versatile capabilities in long-context scenarios. Although some recent benchmarks have been developed to evaluate the long-context capabilities of LLMs, there is a lack of benchmarks evaluating the mathematical reasoning abilities of LLMs over long contexts, which is crucial for LLMs' application in real-world scenarios. In this paper, we introduce MathHay, an automated benchmark designed to assess the long-context mathematical reasoning capabilities of LLMs. Unlike previous benchmarks like Needle in a Haystack, which focus primarily on information retrieval within long texts, MathHay demands models with both information-seeking and complex mathematical reasoning abilities. We conduct extensive experiments on MathHay to assess the long-context mathematical reasoning abilities of eight top-performing LLMs. Even the best-performing model, Gemini-1.5-Pro-002, still struggles with mathematical reasoning over long contexts, achieving only 51.26% accuracy at 128K tokens. This highlights the significant room for improvement on the MathHay benchmark.
Abstract:With the development of the Internet of Things (IoT), network intrusion detection is becoming more complex and extensive. It is essential to investigate an intelligent, automated, and robust network intrusion detection method. Graph neural networks based network intrusion detection methods have been proposed. However, it still needs further studies because the graph construction method of the existing methods does not fully adapt to the characteristics of the practical network intrusion datasets. To address the above issue, this paper proposes a graph neural network algorithm based on behavior similarity (BS-GAT) using graph attention network. First, a novel graph construction method is developed using the behavior similarity by analyzing the characteristics of the practical datasets. The data flows are treated as nodes in the graph, and the behavior rules of nodes are used as edges in the graph, constructing a graph with a relatively uniform number of neighbors for each node. Then, the edge behavior relationship weights are incorporated into the graph attention network to utilize the relationship between data flows and the structure information of the graph, which is used to improve the performance of the network intrusion detection. Finally, experiments are conducted based on the latest datasets to evaluate the performance of the proposed behavior similarity based graph attention network for the network intrusion detection. The results show that the proposed method is effective and has superior performance comparing to existing solutions.