Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Anand Padmanabha Iyer

ReInc: Scaling Training of Dynamic Graph Neural Networks

Jan 25, 2025

Mingyu Guan, Saumia Singhal, Taesoo Kim, Anand Padmanabha Iyer

Abstract:Dynamic Graph Neural Networks (DGNNs) have gained widespread attention due to their applicability in diverse domains such as traffic network prediction, epidemiological forecasting, and social network analysis. In this paper, we present ReInc, a system designed to enable efficient and scalable training of DGNNs on large-scale graphs. ReInc introduces key innovations that capitalize on the unique combination of Graph Neural Networks (GNNs) and Recurrent Neural Networks (RNNs) inherent in DGNNs. By reusing intermediate results and incrementally computing aggregations across consecutive graph snapshots, ReInc significantly enhances computational efficiency. To support these optimizations, ReInc incorporates a novel two-level caching mechanism with a specialized caching policy aligned to the DGNN execution workflow. Additionally, ReInc addresses the challenges of managing structural and temporal dependencies in dynamic graphs through a new distributed training strategy. This approach eliminates communication overheads associated with accessing remote features and redistributing intermediate results. Experimental results demonstrate that ReInc achieves up to an order of magnitude speedup compared to state-of-the-art frameworks, tested across various dynamic GNN architectures and real-world graph datasets.

Via

Access Paper or Ask Questions

Lynx: Enabling Efficient MoE Inference through Dynamic Batch-Aware Expert Selection

Nov 13, 2024

Vima Gupta, Kartik Sinha, Ada Gavrilovska, Anand Padmanabha Iyer

Figure 1 for Lynx: Enabling Efficient MoE Inference through Dynamic Batch-Aware Expert Selection

Figure 2 for Lynx: Enabling Efficient MoE Inference through Dynamic Batch-Aware Expert Selection

Figure 3 for Lynx: Enabling Efficient MoE Inference through Dynamic Batch-Aware Expert Selection

Figure 4 for Lynx: Enabling Efficient MoE Inference through Dynamic Batch-Aware Expert Selection

Abstract:Mixture-of-Experts (MoE) architectures have recently gained popularity in enabling efficient scaling of large language models. However, we uncover a fundamental tension: while MoEs are designed for selective expert activation, production serving requires request batching, which forces the activation of all experts and negates MoE's efficiency benefits during the decode phase. We present Lynx, a system that enables efficient MoE inference through dynamic, batch-aware expert selection. Our key insight is that expert importance varies significantly across tokens and inference phases, creating opportunities for runtime optimization. Lynx leverages this insight through a lightweight framework that dynamically reduces active experts while preserving model accuracy. Our evaluations show that Lynx achieves up to 1.55x reduction in inference latency while maintaining negligible accuracy loss from baseline model across complex code generation and mathematical reasoning tasks.

Via

Access Paper or Ask Questions

Fast and Accurate Performance Analysis of LTE Radio Access Networks

May 17, 2016

Anand Padmanabha Iyer, Ion Stoica, Mosharaf Chowdhury, Li Erran Li

Figure 1 for Fast and Accurate Performance Analysis of LTE Radio Access Networks

Figure 2 for Fast and Accurate Performance Analysis of LTE Radio Access Networks

Figure 3 for Fast and Accurate Performance Analysis of LTE Radio Access Networks

Figure 4 for Fast and Accurate Performance Analysis of LTE Radio Access Networks

Abstract:An increasing amount of analytics is performed on data that is procured in a real-time fashion to make real-time decisions. Such tasks include simple reporting on streams to sophisticated model building. However, the practicality of such analyses are impeded in several domains because they are faced with a fundamental trade-off between data collection latency and analysis accuracy. In this paper, we study this trade-off in the context of a specific domain, Cellular Radio Access Networks (RAN). Our choice of this domain is influenced by its commonalities with several other domains that produce real-time data, our access to a large live dataset, and their real-time nature and dimensionality which makes it a natural fit for a popular analysis technique, machine learning (ML). We find that the latency accuracy trade-off can be resolved using two broad, general techniques: intelligent data grouping and task formulations that leverage domain characteristics. Based on this, we present CellScope, a system that addresses this challenge by applying a domain specific formulation and application of Multi-task Learning (MTL) to RAN performance analysis. It achieves this goal using three techniques: feature engineering to transform raw data into effective features, a PCA inspired similarity metric to group data from geographically nearby base stations sharing performance commonalities, and a hybrid online-offline model for efficient model updates. Our evaluation of CellScope shows that its accuracy improvements over direct application of ML range from 2.5x to 4.4x while reducing the model update overhead by up to 4.8x. We have also used CellScope to analyze a live LTE consisting of over 2 million subscribers for a period of over 10 months, where it uncovered several problems and insights, some of them previously unknown.

Via

Access Paper or Ask Questions