Abstract:Partitioning is a known problem in computer science and is critical in chip design workflows, as advancements in this area can significantly influence design quality and efficiency. Deep Learning (DL) techniques, particularly those involving Graph Neural Networks (GNNs), have demonstrated strong performance in various node, edge, and graph prediction tasks using both inductive and transductive learning methods. A notable area of recent interest within GNNs are pooling layers and their application to graph partitioning. While these methods have yielded promising results across social, computational, and other random graphs, their effectiveness has not yet been explored in the context of VLSI hypergraph netlists. In this study, we introduce a new set of synthetic partitioning benchmarks that emulate real-world netlist characteristics and possess a known upper bound for solution cut quality. We distinguish these benchmarks with the prior work and evaluate existing state-of-the-art partitioning algorithms alongside GNN-based approaches, highlighting their respective advantages and disadvantages.
Abstract:The Rectilinear Steiner Minimum Tree (RSMT) problem is a fundamental problem in VLSI placement and routing and is known to be NP-hard. Traditional RSMT algorithms spend a significant amount of time on finding Steiner points to reduce the total wire length or use heuristics to approximate producing sub-optimal results. We show that Graph Neural Networks (GNNs) can be used to predict optimal Steiner points in RSMTs with high accuracy and can be parallelized on GPUs. In this paper, we propose GAT-Steiner, a graph attention network model that correctly predicts 99.846% of the nets in the ISPD19 benchmark with an average increase in wire length of only 0.480% on suboptimal wire length nets. On randomly generated benchmarks, GAT-Steiner correctly predicts 99.942% with an average increase in wire length of only 0.420% on suboptimal wire length nets.
Abstract:The number of open source language models that can produce Turkish is increasing day by day, as in other languages. In order to create the basic versions of such models, the training of multilingual models is usually continued with Turkish corpora. The alternative is to train the model with only Turkish corpora. In this study, we first introduce the cosmosGPT models that we created with this alternative method. Then, we introduce new finetune datasets for basic language models to fulfill user requests and new evaluation datasets for measuring the capabilities of Turkish language models. Finally, a comprehensive comparison of the adapted Turkish language models on different capabilities is presented. The results show that the language models we built with the monolingual corpus have promising performance despite being about 10 times smaller than the others.
Abstract:The developments that language models have provided in fulfilling almost all kinds of tasks have attracted the attention of not only researchers but also the society and have enabled them to become products. There are commercially successful language models available. However, users may prefer open-source language models due to cost, data privacy, or regulations. Yet, despite the increasing number of these models, there is no comprehensive comparison of their performance for Turkish. This study aims to fill this gap in the literature. A comparison is made among seven selected language models based on their contextual learning and question-answering abilities. Turkish datasets for contextual learning and question-answering were prepared, and both automatic and human evaluations were conducted. The results show that for question-answering, continuing pretraining before fine-tuning with instructional datasets is more successful in adapting multilingual models to Turkish and that in-context learning performances do not much related to question-answering performances.
Abstract:Neural network deployment on low-cost embedded systems, hence on microcontrollers (MCUs), has recently been attracting more attention than ever. Since MCUs have limited memory capacity as well as limited compute-speed, it is critical that we employ model compression, which reduces both memory and compute-speed requirements. In this paper, we add model compression, specifically Deep Compression, and further optimize Unlu's earlier work on arXiv, which efficiently deploys PyTorch models on MCUs. First, we prune the weights in convolutional and fully connected layers. Secondly, the remaining weights and activations are quantized to 8-bit integers from 32-bit floating-point. Finally, forward pass functions are compressed using special data structures for sparse matrices, which store only nonzero weights (without impacting performance and accuracy). In the case of the LeNet-5 model, the memory footprint was reduced by 12.45x, and the inference speed was boosted by 2.57x.