Abstract:Relation extraction is a Natural Language Processing task aiming to extract relationships from textual data. It is a critical step for information extraction. Due to its wide-scale applicability, research in relation extraction has rapidly scaled to using highly advanced neural networks. Despite their computational superiority, modern relation extractors fail to handle complicated extraction scenarios. However, a comprehensive performance analysis of the state-of-the-art relation extractors that compile these challenges has been missing from the literature, and this paper aims to bridge this gap. The goal has been to investigate the possible data-centric characteristics that impede neural relation extraction. Based on extensive experiments conducted using 15 state-of-the-art relation extraction algorithms ranging from recurrent architectures to large language models and seven large-scale datasets, this research suggests that modern relation extractors are not robust to complex data and relation characteristics. It emphasizes pivotal issues, such as contextual ambiguity, correlating relations, long-tail data, and fine-grained relation distributions. In addition, it sets a marker for future directions to alleviate these issues, thereby proving to be a critical resource for novice and advanced researchers. Efficient handling of the challenges described can have significant implications for the field of information extraction, which is a critical part of popular systems such as search engines and chatbots. Data and relevant code can be found at https://github.com/anushkasw/MaxRE.
Abstract:A Bill of Materials (BoM) is a list of all components on a printed circuit board (PCB). Since BoMs are useful for hardware assurance, automatic BoM extraction (AutoBoM) is of great interest to the government and electronics industry. To achieve a high-accuracy AutoBoM process, domain knowledge of PCB text and logos must be utilized. In this study, we discuss the challenges associated with automatic PCB marking extraction and propose 1) a plan for collecting salient PCB marking data, and 2) a framework for incorporating this data for automatic PCB assurance. Given the proposed dataset plan and framework, subsequent future work, implications, and open research possibilities are detailed.
Abstract:Artificial intelligence (AI) and machine learning (ML) techniques have been increasingly used in several fields to improve performance and the level of automation. In recent years, this use has exponentially increased due to the advancement of high-performance computing and the ever increasing size of data. One of such fields is that of hardware design; specifically the design of digital and analog integrated circuits~(ICs), where AI/ ML techniques have been extensively used to address ever-increasing design complexity, aggressive time-to-market, and the growing number of ubiquitous interconnected devices (IoT). However, the security concerns and issues related to IC design have been highly overlooked. In this paper, we summarize the state-of-the-art in AL/ML for circuit design/optimization, security and engineering challenges, research in security-aware CAD/EDA, and future research directions and needs for using AI/ML for security-aware circuit design.
Abstract:The continued outsourcing of printed circuit board (PCB) fabrication to overseas venues necessitates increased hardware assurance capabilities. Toward this end, several automated optical inspection (AOI) techniques have been proposed in the past exploring various aspects of PCB images acquired using digital cameras. In this work, we review state-of-the-art AOI techniques and observed the strong, rapid trend toward machine learning (ML) solutions. These require significant amounts of labeled ground truth data, which is lacking in the publicly available PCB data space. We propose the FICS PBC Image Collection (FPIC) dataset to address this bottleneck in available large-volume, diverse, semantic annotations. Additionally, this work covers the potential increase in hardware security capabilities and observed methodological distinctions highlighted during data collection.
Abstract:In the Reverse Engineering and Hardware Assurance domain, a majority of the data acquisition is done through electron microscopy techniques such as Scanning Electron Microscopy (SEM). However, unlike its counterparts in optical imaging, only a limited number of techniques are available to enhance and extract information from the raw SEM images. In this paper, we introduce an algorithm to segment out Integrated Circuit (IC) structures from the SEM image. Unlike existing algorithms discussed in this paper, this algorithm is unsupervised, parameter-free and does not require prior information on the noise model or features in the target image making it effective in low quality image acquisition scenarios as well. Furthermore, the results from the application of the algorithm on various structures and layers in the IC are reported and discussed.