Abstract:Foundation models in computer vision have demonstrated exceptional performance in zero-shot and few-shot tasks by extracting multi-purpose features from large-scale datasets through self-supervised pre-training methods. However, these models often overlook the severe corruption in cryogenic electron microscopy (cryo-EM) images by high-level noises. We introduce DRACO, a Denoising-Reconstruction Autoencoder for CryO-EM, inspired by the Noise2Noise (N2N) approach. By processing cryo-EM movies into odd and even images and treating them as independent noisy observations, we apply a denoising-reconstruction hybrid training scheme. We mask both images to create denoising and reconstruction tasks. For DRACO's pre-training, the quality of the dataset is essential, we hence build a high-quality, diverse dataset from an uncurated public database, including over 270,000 movies or micrographs. After pre-training, DRACO naturally serves as a generalizable cryo-EM image denoiser and a foundation model for various cryo-EM downstream tasks. DRACO demonstrates the best performance in denoising, micrograph curation, and particle picking tasks compared to state-of-the-art baselines. We will release the code, pre-trained models, and the curated dataset to stimulate further research.
Abstract:Large rotating machines, e.g., compressors, steam turbines, gas turbines, are critical equipment in many process industries such as energy, chemical, and power generation. Due to high rotating speed and tremendous momentum of the rotor, the centrifugal force may lead to flying apart of the rotor parts, which brings a great threat to the operation safety. Early detection and prediction of potential failures could prevent the catastrophic plant downtime and economic loss. In this paper, we divide the operational states of a rotating machine into normal, risky, and high-risk ones based on the time to the moment of failure. Then a cascade classifying algorithm is proposed to predict the states in two steps, first we judge whether the machine is in normal or abnormal condition; for time periods which are predicted as abnormal we further classify them into risky or high-risk states. Moreover, traditional classification model evaluation metrics, such as confusion matrix, true-false accuracy, are static and neglect the online prediction dynamics and uneven wrong-prediction prices. An Online Prediction Ability Index (OPAI) is proposed to select prediction models with consistent online predictions and smaller close-to-downtime prediction errors. Real-world data sets and computational experiments are used to verify the effectiveness of proposed methods.
Abstract:Wind farm needs prediction models for predictive maintenance. There is a need to predict values of non-observable parameters beyond ranges reflected in available data. A prediction model developed for one machine many not perform well in another similar machine. This is usually due to lack of generalizability of data-driven models. To increase generalizability of predictive models, this research integrates the data mining with first-principle knowledge. Physics-based principles are combined with machine learning algorithms through feature engineering, strong rules and divide-and-conquer. The proposed synergy concept is illustrated with the wind turbine blade icing prediction and achieves significant prediction accuracy across different turbines. The proposed process is widely accepted by wind energy predictive maintenance practitioners because of its simplicity and efficiency. Furthermore, this paper demonstrates the importance of embedding physical principles within the machine learning process, and also highlight an important point that the need for more complex machine learning algorithms in industrial big data mining is often much less than it is in other applications, making it essential to incorporate physics and follow Less is More philosophy.