Abstract:A systematic pipeline for data processing and knowledge discovery is essential to extracting knowledge from big data and making recommendations for operational decision-making. The CRISP-DM model is the de-facto standard for developing data-mining projects in practice. However, advancements in data processing technologies require enhancements to this framework. This paper presents the DataPro (a standardized data understanding and processing procedure) model, which extends CRISP-DM and emphasizes the link between data scientists and stakeholders by adding the "technical understanding" and "implementation" phases. Firstly, the "technical understanding" phase aligns business demands with technical requirements, ensuring the technical team's accurate comprehension of business goals. Next, the "implementation" phase focuses on the practical application of developed data science models, ensuring theoretical models are effectively applied in business contexts. Furthermore, clearly defining roles and responsibilities in each phase enhances management and communication among all participants. Afterward, a case study on an eco-driving data science project for fuel efficiency analysis in the Danish public transportation sector illustrates the application of the DataPro model. By following the proposed framework, the project identified key business objectives, translated them into technical requirements, and developed models that provided actionable insights for reducing fuel consumption. Finally, the model is evaluated qualitatively, demonstrating its superiority over other data science procedures.
Abstract:Advanced machine learning algorithms are increasingly utilized to provide data-based prediction and decision-making support in Industry 4.0. However, the prediction accuracy achieved by the existing models is insufficient to warrant practical implementation in real-world applications. This is because not all features present in real-world datasets possess a direct relevance to the predictive analysis being conducted. Consequently, the careful incorporation of select features has the potential to yield a substantial positive impact on the outcome. To address the research gap, this paper proposes a novel hybrid framework that combines the feature importance detector - local interpretable model-agnostic explanations (LIME) and the feature interaction detector - neural interaction detection (NID), to improve prediction accuracy. By applying the proposed framework, unnecessary features can be eliminated, and interactions are encoded to generate a more conducive dataset for predictive purposes. Subsequently, the proposed model is deployed to refine the prediction of electricity consumption in foundry processing. The experimental outcomes reveal an augmentation of up to 9.56% in the R2 score, and a diminution of up to 24.05% in the root mean square error.