Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Karen Li

SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts

May 13, 2024

Raghu Prabhakar, Ram Sivaramakrishnan, Darshan Gandhi, Yun Du, Mingran Wang, Xiangyu Song, Kejie Zhang, Tianren Gao, Angela Wang, Karen Li(+20 more)

Figure 1 for SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts

Figure 2 for SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts

Figure 3 for SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts

Figure 4 for SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts

Abstract:Monolithic large language models (LLMs) like GPT-4 have paved the way for modern generative AI applications. Training, serving, and maintaining monolithic LLMs at scale, however, remains prohibitively expensive and challenging. The disproportionate increase in compute-to-memory ratio of modern AI accelerators have created a memory wall, necessitating new methods to deploy AI. Composition of Experts (CoE) is an alternative modular approach that lowers the cost and complexity of training and serving. However, this approach presents two key challenges when using conventional hardware: (1) without fused operations, smaller models have lower operational intensity, which makes high utilization more challenging to achieve; and (2) hosting a large number of models can be either prohibitively expensive or slow when dynamically switching between them. In this paper, we describe how combining CoE, streaming dataflow, and a three-tier memory system scales the AI memory wall. We describe Samba-CoE, a CoE system with 150 experts and a trillion total parameters. We deploy Samba-CoE on the SambaNova SN40L Reconfigurable Dataflow Unit (RDU) - a commercial dataflow accelerator architecture that has been co-designed for enterprise inference and training applications. The chip introduces a new three-tier memory system with on-chip distributed SRAM, on-package HBM, and off-package DDR DRAM. A dedicated inter-RDU network enables scaling up and out over multiple sockets. We demonstrate speedups ranging from 2x to 13x on various benchmarks running on eight RDU sockets compared with an unfused baseline. We show that for CoE inference deployments, the 8-socket RDU Node reduces machine footprint by up to 19x, speeds up model switching time by 15x to 31x, and achieves an overall speedup of 3.7x over a DGX H100 and 6.6x over a DGX A100.

Via

Access Paper or Ask Questions

A Novel Low-Cost, Recyclable, Easy-to-Build Robot Blimp For Transporting Supplies in Hard-to-Reach Locations

Sep 13, 2023

Karen Li, Shuhang Hou, Matyas Negash, Jiawei Xu, Edward Jeffs, Diego S. D'Antonio, David Saldaña

Abstract:Rural communities in remote areas often encounter significant challenges when it comes to accessing emergency healthcare services and essential supplies due to a lack of adequate transportation infrastructure. The situation is further exacerbated by poorly maintained, damaged, or flooded roads, making it arduous for rural residents to obtain the necessary aid in critical situations. Limited budgets and technological constraints pose additional obstacles, hindering the prompt response of local rescue teams during emergencies. The transportation of crucial resources, such as medical supplies and food, plays a vital role in saving lives in these situations. In light of these obstacles, our objective is to improve accessibility and alleviate the suffering of vulnerable populations by automating transportation tasks using low-cost robotic systems. We propose a low-cost, easy-to-build blimp robot (UAVs), that can significantly enhance the efficiency and effectiveness of local emergency responses.

* IEEE Global Humanitarian Technology Conference (GHTC 2023)

Via

Access Paper or Ask Questions

Land Use Prediction using Electro-Optical to SAR Few-Shot Transfer Learning

Dec 04, 2022

Marcel Hussing, Karen Li, Eric Eaton

Abstract:Satellite image analysis has important implications for land use, urbanization, and ecosystem monitoring. Deep learning methods can facilitate the analysis of different satellite modalities, such as electro-optical (EO) and synthetic aperture radar (SAR) imagery, by supporting knowledge transfer between the modalities to compensate for individual shortcomings. Recent progress has shown how distributional alignment of neural network embeddings can produce powerful transfer learning models by employing a sliced Wasserstein distance (SWD) loss. We analyze how this method can be applied to Sentinel-1 and -2 satellite imagery and develop several extensions toward making it effective in practice. In an application to few-shot Local Climate Zone (LCZ) prediction, we show that these networks outperform multiple common baselines on datasets with a large number of classes. Further, we provide evidence that instance normalization can significantly stabilize the training process and that explicitly shaping the embedding space using supervised contrastive learning can lead to improved performance.

* Published at Tackling Climate Change with Machine Learning workshop at NeurIPS 2022

Via

Access Paper or Ask Questions

Minute ventilation measurement using Plethysmographic Imaging and lighting parameters

Aug 29, 2022

Daniel Minati, Ludwik Sams, Karen Li, Bo Ji, Krishna Vardhan

Figure 1 for Minute ventilation measurement using Plethysmographic Imaging and lighting parameters

Figure 2 for Minute ventilation measurement using Plethysmographic Imaging and lighting parameters

Figure 3 for Minute ventilation measurement using Plethysmographic Imaging and lighting parameters

Figure 4 for Minute ventilation measurement using Plethysmographic Imaging and lighting parameters

Abstract:Breathing disorders such as sleep apnea is a critical disorder that affects a large number of individuals due to the insufficient capacity of the lungs to contain/exchange oxygen and carbon dioxide to ensure that the body is in the stable state of homeostasis. Respiratory Measurements such as minute ventilation can be used in correlation with other physiological measurements such as heart rate and heart rate variability for remote monitoring of health and detecting symptoms of such breathing related disorders. In this work, we formulate a deep learning based approach to measure remote ventilation on a private dataset. The dataset will be made public upon acceptance of this work. We use two versions of a deep neural network to estimate the minute ventilation from data streams obtained through wearable heart rate and respiratory devices. We demonstrate that the simple design of our pipeline - which includes lightweight deep neural networks - can be easily incorporate into real time health monitoring systems.

* 6 pages, 4 figures

Via

Access Paper or Ask Questions

"Yeah, it does have aWindows `98 Vibe'': Usability Study of Security Features in Programmable Logic Controllers

Aug 04, 2022

Karen Li, Kopo M. Ramokapane, Awais Rashid

Figure 1 for "Yeah, it does have aWindows `98 Vibe'': Usability Study of Security Features in Programmable Logic Controllers

Figure 2 for "Yeah, it does have aWindows `98 Vibe'': Usability Study of Security Features in Programmable Logic Controllers

Figure 3 for "Yeah, it does have aWindows `98 Vibe'': Usability Study of Security Features in Programmable Logic Controllers

Figure 4 for "Yeah, it does have aWindows `98 Vibe'': Usability Study of Security Features in Programmable Logic Controllers

Abstract:Programmable Logic Controllers (PLCs) drive industrial processes critical to society, e.g., water treatment and distribution, electricity and fuel networks. Search engines (e.g., Shodan) have highlighted that Programmable Logic Controllers (PLCs) are often left exposed to the Internet, one of the main reasons being the misconfigurations of security settings. This leads to the question -- why do these misconfigurations occur and, specifically, whether usability of security controls plays a part? To date, the usability of configuring PLC security mechanisms has not been studied. We present the first investigation through a task-based study and subsequent semi-structured interviews (N=19). We explore the usability of PLC connection configurations and two key security mechanisms (i.e., access levels and user administration). We find that the use of unfamiliar labels, layouts and misleading terminology exacerbates an already complex process of configuring security mechanisms. Our results uncover various (mis-) perceptions about the security controls and how design constraints, e.g., safety and lack of regular updates (due to long term nature of such systems), provide significant challenges to realization of modern HCI and usability principles. Based on these findings, we provide design recommendations to bring usable security in industrial settings at par with its IT counterpart.

Via

Access Paper or Ask Questions