Abstract:With the increasing demand for high-performance and high-efficiency computing, cloud computing, especially serverless computing, has gradually become a research hotspot in recent years, attracting numerous research attention. Meanwhile, MapReduce, which is a popular big data processing model in the industry, has been widely applied in various fields. Inspired by the serverless framework of Function as a Service and the high concurrency and robustness of MapReduce programming model, this paper focus on combining them to reduce the time span and increase the efficiency when executing the word frequency counting task. In this case, the paper use a MapReduce programming model based on a serverless computing platform to figure out the most optimized number of Map functions and Reduce functions for a particular task. For the same amount of workload, extensive experiments show both execution time reduces and the overall efficiency of the program improves at different rates as the number of map functions and reduce functions increases. This paper suppose the discovery of the most optimized number of map and reduce functions can help cooperations and programmers figure out the most optimized solutions.




Abstract:As a popular form of knowledge and experience, patterns and their identification have been critical tasks in most data mining applications. However, as far as we are aware, no study has systematically examined the dynamics of pattern values and their reuse under varying conditions. We argue that when problem conditions such as the distributions of random variables change, the patterns that performed well in previous circumstances may become less effective and adoption of these patterns would result in sub-optimal solutions. In response, we make a connection between data mining and the duality theory in operations research and propose a novel scheme to efficiently identify patterns and dynamically quantify their values for each specific condition. Our method quantifies the value of patterns based on their ability to satisfy stochastic constraints and their effects on the objective value, allowing high-quality patterns and their combinations to be detected. We use the online bin packing problem to evaluate the effectiveness of the proposed scheme and illustrate the online packing procedure with the guidance of patterns that address the inherent uncertainty of the problem. Results show that the proposed algorithm significantly outperforms the state-of-the-art methods. We also analysed in detail the distinctive features of the proposed methods that lead to performance improvement and the special cases where our method can be further improved.