Abstract:The growing availability of sensors within semiconductor manufacturing processes makes it feasible to detect defective wafers with data-driven models. Without directly measuring the quality of semiconductor devices, they capture the modalities between diverse sensor readings and can be used to predict key quality indicators (KQI, \textit{e.g.}, roughness, resistance) to detect faulty products, significantly reducing the capital and human cost in maintaining physical metrology steps. Nevertheless, existing models pay little attention to the correlations among different processes for diverse wafer products and commonly struggle with generalizability issues. To enable generic fault detection, in this work, we propose a modular network (MN) trained using time series stage-wise datasets that embodies the structure of the manufacturing process. It decomposes KQI prediction as a combination of stage modules to simulate compositional semiconductor manufacturing, universally enhancing faulty wafer detection among different wafer types and manufacturing processes. Extensive experiments demonstrate the usefulness of our approach, and shed light on how the compositional design provides an interpretable interface for more practical applications.
Abstract:The semiconductor industry is one of the most technology-evolving and capital-intensive market sectors. Effective inspection and metrology are necessary to improve product yield, increase product quality and reduce costs. In recent years, many semiconductor manufacturing equipments are equipped with sensors to facilitate real-time monitoring of the production process. These production-state and equipment-state sensor data provide an opportunity to practice machine-learning technologies in various domains, such as anomaly/fault detection, maintenance scheduling, quality prediction, etc. In this work, we focus on the task of soft sensing regression, which uses sensor data to predict impending inspection measurements that used to be measured in wafer inspection and metrology systems. We proposed an LSTM-based regressor and designed two loss functions for model training. Although engineers may look at our prediction errors in a subjective manner, a new piece-wise evaluation metric was proposed for assessing model accuracy in a mathematical way. The experimental results demonstrated that the proposed model can achieve accurate and early prediction of various types of inspections in complicated manufacturing processes.
Abstract:The growing availability of the data collected from smart manufacturing is changing the paradigms of production monitoring and control. The increasing complexity and content of the wafer manufacturing process in addition to the time-varying unexpected disturbances and uncertainties, make it infeasible to do the control process with model-based approaches. As a result, data-driven soft-sensing modeling has become more prevalent in wafer process diagnostics. Recently, deep learning has been utilized in soft sensing system with promising performance on highly nonlinear and dynamic time-series data. Despite its successes in soft-sensing systems, however, the underlying logic of the deep learning framework is hard to understand. In this paper, we propose a deep learning-based model for defective wafer detection using a highly imbalanced dataset. To understand how the proposed model works, the deep visualization approach is applied. Additionally, the model is then fine-tuned guided by the deep visualization. Extensive experiments are performed to validate the effectiveness of the proposed system. The results provide an interpretation of how the model works and an instructive fine-tuning method based on the interpretation.
Abstract:Over the last few decades, modern industrial processes have investigated several cost-effective methodologies to improve the productivity and yield of semiconductor manufacturing. While playing an essential role in facilitating real-time monitoring and control, the data-driven soft-sensors in industries have provided a competitive edge when augmented with deep learning approaches for wafer fault-diagnostics. Despite the success of deep learning methods across various domains, they tend to suffer from bad performance on multi-variate soft-sensing data domains. To mitigate this, we propose a soft-sensing ConFormer (CONvolutional transFORMER) for wafer fault-diagnostic classification task which primarily consists of multi-head convolution modules that reap the benefits of fast and light-weight operations of convolutions, and also the ability to learn the robust representations through multi-head design alike transformers. Another key issue is that traditional learning paradigms tend to suffer from low performance on noisy and highly-imbalanced soft-sensing data. To address this, we augment our soft-sensing ConFormer model with a curriculum learning-based loss function, which effectively learns easy samples in the early phase of training and difficult ones later. To further demonstrate the utility of our proposed architecture, we performed extensive experiments on various toolsets of Seagate Technology's wafer manufacturing process which are shared openly along with this work. To the best of our knowledge, this is the first time that curriculum learning-based soft-sensing ConFormer architecture has been proposed for soft-sensing data and our results show strong promise for future use in soft-sensing research domain.
Abstract:In the era of big data, data-driven based classification has become an essential method in smart manufacturing to guide production and optimize inspection. The industrial data obtained in practice is usually time-series data collected by soft sensors, which are highly nonlinear, nonstationary, imbalanced, and noisy. Most existing soft-sensing machine learning models focus on capturing either intra-series temporal dependencies or pre-defined inter-series correlations, while ignoring the correlation between labels as each instance is associated with multiple labels simultaneously. In this paper, we propose a novel graph based soft-sensing neural network (GraSSNet) for multivariate time-series classification of noisy and highly-imbalanced soft-sensing data. The proposed GraSSNet is able to 1) capture the inter-series and intra-series dependencies jointly in the spectral domain; 2) exploit the label correlations by superimposing label graph that built from statistical co-occurrence information; 3) learn features with attention mechanism from both textual and numerical domain; and 4) leverage unlabeled data and mitigate data imbalance by semi-supervised learning. Comparative studies with other commonly used classifiers are carried out on Seagate soft sensing data, and the experimental results validate the competitive performance of our proposed method.
Abstract:With the rapid development of AI technology in recent years, there have been many studies with deep learning models in soft sensing area. However, the models have become more complex, yet, the data sets remain limited: researchers are fitting million-parameter models with hundreds of data samples, which is insufficient to exercise the effectiveness of their models and thus often fail to perform when implemented in industrial applications. To solve this long-lasting problem, we are providing large scale, high dimensional time series manufacturing sensor data from Seagate Technology to the public. We demonstrate the challenges and effectiveness of modeling industrial big data by a Soft Sensing Transformer model on these data sets. Transformer is used because, it has outperformed state-of-the-art techniques in Natural Language Processing, and since then has also performed well in the direct application to computer vision without introduction of image-specific inductive biases. We observe the similarity of a sentence structure to the sensor readings and process the multi-variable sensor readings in a time series in a similar manner of sentences in natural language. The high-dimensional time-series data is formatted into the same shape of embedded sentences and fed into the transformer model. The results show that transformer model outperforms the benchmark models in soft sensing field based on auto-encoder and long short-term memory (LSTM) models. To the best of our knowledge, we are the first team in academia or industry to benchmark the performance of original transformer model with large-scale numerical soft sensing data.
Abstract:IEEE BigData 2021 Cup: Soft Sensing at Scale is a data mining competition organized by Seagate Technology, in association with the IEEE BigData 2021 conference. The scope of this challenge is to tackle the task of classifying soft sensing data with machine learning techniques. In this paper we go into the details of the challenge and describe the data set provided to participants. We define the metrics of interest, baseline models, and describe approaches we found meaningful which may be a good starting point for further analysis. We discuss the results obtained with our approaches and give insights on what potential challenges participants may run into. Students, researchers, and anyone interested in working on a major industrial problem are welcome to participate in the challenge!
Abstract:With the proliferation of IoT devices, the distributed control systems are now capturing and processing more sensors at higher frequency than ever before. These new data, due to their volume and novelty, cannot be effectively consumed without the help of data-driven techniques. Deep learning is emerging as a promising technique to analyze these data, particularly in soft sensor modeling. The strong representational capabilities of complex data and the flexibility it offers from an architectural perspective make it a topic of active applied research in industrial settings. However, the successful applications of deep learning in soft sensing are still not widely integrated in factory control systems, because most of the research on soft sensing do not have access to large scale industrial data which are varied, noisy and incomplete. The results published in most research papers are therefore not easily reproduced when applied to the variety of data in industrial settings. Here we provide manufacturing data sets that are much larger and more complex than public open soft sensor data. Moreover, the data sets are from Seagate factories on active service with only necessary anonymization, so that they reflect the complex and noisy nature of real-world data. We introduce a variance weighted multi-headed auto-encoder classification model that fits well into the high-dimensional and highly imbalanced data. Besides the use of weighting or sampling methods to handle the highly imbalanced data, the model also simultaneously predicts multiple outputs by exploiting output-supervised representation learning and multi-task weighting.