Abstract:Large deep learning models have achieved remarkable success but are resource-intensive, posing challenges in computational cost and memory usage. We introduce CURing, a novel model compression method based on CUR matrix decomposition, which approximates weight matrices as the product of selected columns (C) and rows (R), and a small linking matrix (U). We apply this decomposition to weights chosen based on the combined influence of their magnitudes and activations. By identifying and retaining informative rows and columns, CURing significantly reduces model size with minimal performance loss. It preserves the original network's input/output structures, retains important features such as non-negativity, and the compressed model's activation patterns align with the original, thereby enhancing interpretability.
Abstract:As artificial intelligence (AI) continues to permeate various domains, concerns surrounding trust and transparency in AI-driven inference and training processes have emerged, particularly with respect to potential biases and traceability challenges. Decentralized solutions such as blockchain have been proposed to tackle these issues, but they often struggle when dealing with large-scale models, leading to time-consuming inference and inefficient training verification. To overcome these limitations, we introduce BRAIN, a Blockchain-based Reliable AI Network, a novel platform specifically designed to ensure reliable inference and training of large models. BRAIN harnesses a unique two-phase transaction mechanism, allowing real-time processing via pipelining by separating request and response transactions. Each randomly-selected inference committee commits and reveals the inference results, and upon reaching an agreement through a smart contract, then the requested operation is executed using the consensus result. Additionally, BRAIN carries out training by employing a randomly-selected training committee. They submit commit and reveal transactions along with their respective scores, enabling local model aggregation based on the median value of the scores. Experimental results demonstrate that BRAIN delivers considerably higher inference throughput at reasonable gas fees. In particular, BRAIN's tasks-per-second performance is 454.4293 times greater than that of a naive single-phase implementation.
Abstract:As ethereum blockchain has become popular, the number of users and transactions has skyrocketed, causing an explosive increase of its data size. As a result, ordinary clients using PCs or smartphones cannot easily bootstrap as a full node, but rely on other full nodes such as the miners to run or verify transactions. This may affect the security of ethereum, so light bootstrapping techniques such as fast sync has been proposed to download only parts of full data, yet the space overhead is still too high. One of the biggest space overhead that cannot easily be reduced is caused by saving the state of all accounts in the block's state trie. Fortunately, we found that more than 90% of accounts are inactive and old transactions are hard to be manipulated. Based on these observations, this paper propose a novel optimization technique called ethanos that can reduce bootstrapping cost by sweeping inactive accounts periodically and by not downloading old transactions. If an inactive account becomes active, ethanos restore its state by running a restoration transaction. Also, ethanos gives incentives for archive nodes to maintain the old transactions for possible re-verification. We implemented ethanos by instrumenting the go-ethereum (geth) client and evaluated with the real 113 million transactions from 14 million accounts between 7M-th and 8M-th blocks in ethereum. Our experimental result shows that ethanos can reduce the size of the account state by half, which, if combined with removing old transactions, may reduce the storage size for bootstrapping to around 1GB. This would be reasonable enough for ordinary clients to bootstrap on their personal devices.