Abstract:Deep Learning Recommendation Models (DLRMs) play a crucial role in delivering personalized content across web applications such as social networking and video streaming. However, with improvements in performance, the parameter size of DLRMs has grown to terabyte (TB) scales, accompanied by memory bandwidth demands exceeding TB/s levels. Furthermore, the workload intensity within the model varies based on the target mechanism, making it difficult to build an optimized recommendation system. In this paper, we propose SCRec, a scalable computational storage recommendation system that can handle TB-scale industrial DLRMs while guaranteeing high bandwidth requirements. SCRec utilizes a software framework that features a mixed-integer programming (MIP)-based cost model, efficiently fetching data based on data access patterns and adaptively configuring memory-centric and compute-centric cores. Additionally, SCRec integrates hardware acceleration cores to enhance DLRM computations, particularly allowing for the high-performance reconstruction of approximated embedding vectors from extremely compressed tensor-train (TT) format. By combining its software framework and hardware accelerators, while eliminating data communication overhead by being implemented on a single server, SCRec achieves substantial improvements in DLRM inference performance. It delivers up to 55.77$\times$ speedup compared to a CPU-DRAM system with no loss in accuracy and up to 13.35$\times$ energy efficiency gains over a multi-GPU system.
Abstract:Different from conventional passive reconfigurable intelligent surfaces (RISs), incident signals and thermal noise can be amplified at active RISs. By exploiting the amplifying capability of active RISs, noticeable performance improvement can be expected when precise channel state information (CSI) is available. Since obtaining perfect CSI related to an RIS is difficult in practice, a robust transmission design is proposed in this paper to tackle the channel uncertainty issue, which will be more severe for active RIS-aided systems. To account for the worst-case scenario, the minimum achievable rate of each user is derived under a statistical CSI error model. Subsequently, an optimization problem is formulated to maximize the sum of the minimum achievable rate. Since the objective function is non-concave, the formulated problem is transformed into a tractable lower bound maximization problem, which is solved using an alternating optimization method. Numerical results show that the proposed robust design outperforms a baseline scheme that only exploits estimated CSI.