Abstract:The availability of pre-trained models (PTMs) has enabled faster deployment of machine learning across applications by reducing the need for extensive training. Techniques like quantization and distillation have further expanded PTM applicability to resource-constrained IoT hardware. Given the many PTM options for any given task, engineers often find it too costly to evaluate each model's suitability. Approaches such as LogME, LEEP, and ModelSpider help streamline model selection by estimating task relevance without exhaustive tuning. However, these methods largely leave hardware constraints as future work-a significant limitation in IoT settings. In this paper, we identify the limitations of current model recommendation approaches regarding hardware constraints and introduce a novel, hardware-aware method for PTM selection. We also propose a research agenda to guide the development of effective, hardware-conscious model recommendation systems for IoT applications.
Abstract:As we increasingly depend on software systems, the consequences of breaches in the software supply chain become more severe. High-profile cyber attacks like those on SolarWinds and ShadowHammer have resulted in significant financial and data losses, underlining the need for stronger cybersecurity. One way to prevent future breaches is by studying past failures. However, traditional methods of analyzing these failures require manually reading and summarizing reports about them. Automated support could reduce costs and allow analysis of more failures. Natural Language Processing (NLP) techniques such as Large Language Models (LLMs) could be leveraged to assist the analysis of failures. In this study, we assessed the ability of Large Language Models (LLMs) to analyze historical software supply chain breaches. We used LLMs to replicate the manual analysis of 69 software supply chain security failures performed by members of the Cloud Native Computing Foundation (CNCF). We developed prompts for LLMs to categorize these by four dimensions: type of compromise, intent, nature, and impact. GPT 3.5s categorizations had an average accuracy of 68% and Bard had an accuracy of 58% over these dimensions. We report that LLMs effectively characterize software supply chain failures when the source articles are detailed enough for consensus among manual analysts, but cannot yet replace human analysts. Future work can improve LLM performance in this context, and study a broader range of articles and failures.