Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Peiyuan Liao

Shammie

GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism

Jun 24, 2024

Byungsoo Jeon, Mengdi Wu, Shiyi Cao, Sunghyun Kim, Sunghyun Park, Neeraj Aggarwal, Colin Unger, Daiyaan Arfeen, Peiyuan Liao, Xupeng Miao(+4 more)

Figure 1 for GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism

Figure 2 for GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism

Figure 3 for GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism

Figure 4 for GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism

Abstract:Deep neural networks (DNNs) continue to grow rapidly in size, making them infeasible to train on a single device. Pipeline parallelism is commonly used in existing DNN systems to support large-scale DNN training by partitioning a DNN into multiple stages, which concurrently perform DNN training for different micro-batches in a pipeline fashion. However, existing pipeline-parallel approaches only consider sequential pipeline stages and thus ignore the topology of a DNN, resulting in missed model-parallel opportunities. This paper presents graph pipeline parallelism (GPP), a new pipeline-parallel scheme that partitions a DNN into pipeline stages whose dependencies are identified by a directed acyclic graph. GPP generalizes existing sequential pipeline parallelism and preserves the inherent topology of a DNN to enable concurrent execution of computationally-independent operators, resulting in reduced memory requirement and improved GPU performance. In addition, we develop GraphPipe, a distributed system that exploits GPP strategies to enable performant and scalable DNN training. GraphPipe partitions a DNN into a graph of stages, optimizes micro-batch schedules for these stages, and parallelizes DNN training using the discovered GPP strategies. Evaluation on a variety of DNNs shows that GraphPipe outperforms existing pipeline-parallel systems such as PipeDream and Piper by up to 1.6X. GraphPipe also reduces the search time by 9-21X compared to PipeDream and Piper.

Via

Access Paper or Ask Questions

The ArtBench Dataset: Benchmarking Generative Models with Artworks

Jun 22, 2022

Peiyuan Liao, Xiuyu Li, Xihui Liu, Kurt Keutzer

Figure 1 for The ArtBench Dataset: Benchmarking Generative Models with Artworks

Figure 2 for The ArtBench Dataset: Benchmarking Generative Models with Artworks

Figure 3 for The ArtBench Dataset: Benchmarking Generative Models with Artworks

Figure 4 for The ArtBench Dataset: Benchmarking Generative Models with Artworks

Abstract:We introduce ArtBench-10, the first class-balanced, high-quality, cleanly annotated, and standardized dataset for benchmarking artwork generation. It comprises 60,000 images of artwork from 10 distinctive artistic styles, with 5,000 training images and 1,000 testing images per style. ArtBench-10 has several advantages over previous artwork datasets. Firstly, it is class-balanced while most previous artwork datasets suffer from the long tail class distributions. Secondly, the images are of high quality with clean annotations. Thirdly, ArtBench-10 is created with standardized data collection, annotation, filtering, and preprocessing procedures. We provide three versions of the dataset with different resolutions ($32\times32$, $256\times256$, and original image size), formatted in a way that is easy to be incorporated by popular machine learning frameworks. We also conduct extensive benchmarking experiments using representative image synthesis models with ArtBench-10 and present in-depth analysis. The dataset is available at https://github.com/liaopeiyuan/artbench under a Fair Use license.

* The first two authors contributed equally to this work. The code and data are available at https://github.com/liaopeiyuan/artbench

Via

Access Paper or Ask Questions

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Jun 10, 2022

Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso(+435 more)

Abstract:Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 442 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.

* 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Via

Access Paper or Ask Questions

Learning Weakly-Supervised Contrastive Representations

Feb 18, 2022

Yao-Hung Hubert Tsai, Tianqin Li, Weixin Liu, Peiyuan Liao, Ruslan Salakhutdinov, Louis-Philippe Morency

Figure 1 for Learning Weakly-Supervised Contrastive Representations

Figure 2 for Learning Weakly-Supervised Contrastive Representations

Figure 3 for Learning Weakly-Supervised Contrastive Representations

Figure 4 for Learning Weakly-Supervised Contrastive Representations

Abstract:We argue that a form of the valuable information provided by the auxiliary information is its implied data clustering information. For instance, considering hashtags as auxiliary information, we can hypothesize that an Instagram image will be semantically more similar with the same hashtags. With this intuition, we present a two-stage weakly-supervised contrastive learning approach. The first stage is to cluster data according to its auxiliary information. The second stage is to learn similar representations within the same cluster and dissimilar representations for data from different clusters. Our empirical experiments suggest the following three contributions. First, compared to conventional self-supervised representations, the auxiliary-information-infused representations bring the performance closer to the supervised representations, which use direct downstream labels as supervision signals. Second, our approach performs the best in most cases, when comparing our approach with other baseline representation learning methods that also leverage auxiliary data information. Third, we show that our approach also works well with unsupervised constructed clusters (e.g., no auxiliary information), resulting in a strong unsupervised representation learning approach.

* Published as ICLR 2022. arXiv admin note: substantial text overlap with arXiv:2106.02869

Via

Access Paper or Ask Questions

Collage: Automated Integration of Deep Learning Backends

Nov 01, 2021

Byungsoo Jeon, Sunghyun Park, Peiyuan Liao, Sheng Xu, Tianqi Chen, Zhihao Jia

Figure 1 for Collage: Automated Integration of Deep Learning Backends

Figure 2 for Collage: Automated Integration of Deep Learning Backends

Figure 3 for Collage: Automated Integration of Deep Learning Backends

Figure 4 for Collage: Automated Integration of Deep Learning Backends

Abstract:Strong demands for efficient deployment of Deep Learning (DL) applications prompt the rapid development of a rich DL ecosystem. To keep up with its fast advancement, it is crucial for DL frameworks to efficiently integrate a variety of optimized libraries and runtimes as their backends and generate the fastest possible executable by using them properly. However, current DL frameworks require significant manual effort to integrate diverse backends and often fail to deliver high performance. In this paper, we propose Collage, an automatic framework for integrating DL backends. Collage provides a backend registration interface that allows users to precisely specify the capability of various backends. By leveraging the specifications of available backends, Collage searches for an optimized backend placement for a given workload and execution environment. Our evaluation shows that Collage automatically integrates multiple backends together without manual intervention, and outperforms existing frameworks by 1.21x, 1.39x, 1.40x on two different NVIDIA GPUs and an Intel CPU respectively.

Via

Access Paper or Ask Questions

Integrating Auxiliary Information in Self-supervised Learning

Jun 05, 2021

Yao-Hung Hubert Tsai, Tianqin Li, Weixin Liu, Peiyuan Liao, Ruslan Salakhutdinov, Louis-Philippe Morency

Figure 1 for Integrating Auxiliary Information in Self-supervised Learning

Figure 2 for Integrating Auxiliary Information in Self-supervised Learning

Figure 3 for Integrating Auxiliary Information in Self-supervised Learning

Figure 4 for Integrating Auxiliary Information in Self-supervised Learning

Abstract:This paper presents to integrate the auxiliary information (e.g., additional attributes for data such as the hashtags for Instagram images) in the self-supervised learning process. We first observe that the auxiliary information may bring us useful information about data structures: for instance, the Instagram images with the same hashtags can be semantically similar. Hence, to leverage the structural information from the auxiliary information, we present to construct data clusters according to the auxiliary information. Then, we introduce the Clustering InfoNCE (Cl-InfoNCE) objective that learns similar representations for augmented variants of data from the same cluster and dissimilar representations for data from different clusters. Our approach contributes as follows: 1) Comparing to conventional self-supervised representations, the auxiliary-information-infused self-supervised representations bring the performance closer to the supervised representations; 2) The presented Cl-InfoNCE can also work with unsupervised constructed clusters (e.g., k-means clusters) and outperform strong clustering-based self-supervised learning approaches, such as the Prototypical Contrastive Learning (PCL) method; 3) We show that Cl-InfoNCE may be a better approach to leverage the data clustering information, by comparing it to the baseline approach - learning to predict the clustering assignments with cross-entropy loss. For analysis, we connect the goodness of the learned representations with the statistical relationships: i) the mutual information between the labels and the clusters and ii) the conditional entropy of the clusters given the labels.

Via

Access Paper or Ask Questions

Google Landmark Recognition 2020 Competition Third Place Solution

Oct 11, 2020

Qishen Ha, Bo Liu, Fuxu Liu, Peiyuan Liao

Figure 1 for Google Landmark Recognition 2020 Competition Third Place Solution

Figure 2 for Google Landmark Recognition 2020 Competition Third Place Solution

Figure 3 for Google Landmark Recognition 2020 Competition Third Place Solution

Figure 4 for Google Landmark Recognition 2020 Competition Third Place Solution

Abstract:We present our third place solution to the Google Landmark Recognition 2020 competition. It is an ensemble of global features only Sub-center ArcFace models. We introduce dynamic margins for ArcFace loss, a family of tune-able margin functions of class size, designed to deal with the extreme imbalance in GLDv2 dataset. Progressive finetuning and careful postprocessing are also key to the solution. Our two submissions scored 0.6344 and 0.6289 on private leaderboard, both ranking third place out of 736 teams.

* 5 pages, 2 figures

Via

Access Paper or Ask Questions

Graph Adversarial Networks: Protecting Information against Adversarial Attacks

Oct 07, 2020

Peiyuan Liao, Han Zhao, Keyulu Xu, Tommi Jaakkola, Geoffrey Gordon, Stefanie Jegelka, Ruslan Salakhutdinov

Figure 1 for Graph Adversarial Networks: Protecting Information against Adversarial Attacks

Figure 2 for Graph Adversarial Networks: Protecting Information against Adversarial Attacks

Figure 3 for Graph Adversarial Networks: Protecting Information against Adversarial Attacks

Figure 4 for Graph Adversarial Networks: Protecting Information against Adversarial Attacks

Abstract:We study the problem of protecting information when learning with graph-structured data. While the advent of Graph Neural Networks (GNNs) has greatly improved node and graph representational learning in many applications, the neighborhood aggregation paradigm exposes additional vulnerabilities to attackers seeking to extract node-level information about sensitive attributes. To counter this, we propose a minimax game between the desired GNN encoder and the worst-case attacker. The resulting adversarial training creates a strong defense against inference attacks, while only suffering a small loss in task performance. We analyze the effectiveness of our framework against a worst-case adversary, and characterize the trade-off between predictive accuracy and adversarial defense. Experiments across multiple datasets from recommender systems, knowledge graphs and quantum chemistry demonstrate that the proposed approach provides a robust defense across various graph structures and tasks, while producing competitive GNN encoders. Our code is available at https://github.com/liaopeiyuan/GAL.

* Our code is available at https://github.com/liaopeiyuan/GAL

Via

Access Paper or Ask Questions

CAE-ADMM: Implicit Bitrate Optimization via ADMM-based Pruning in Compressive Autoencoders

Jan 30, 2019

Haimeng Zhao, Peiyuan Liao

Figure 1 for CAE-ADMM: Implicit Bitrate Optimization via ADMM-based Pruning in Compressive Autoencoders

Figure 2 for CAE-ADMM: Implicit Bitrate Optimization via ADMM-based Pruning in Compressive Autoencoders

Figure 3 for CAE-ADMM: Implicit Bitrate Optimization via ADMM-based Pruning in Compressive Autoencoders

Figure 4 for CAE-ADMM: Implicit Bitrate Optimization via ADMM-based Pruning in Compressive Autoencoders

Abstract:We introduce CAE-ADMM (ADMM-pruned compressive autoencoder), a lossy image compression model, inspired by researches in neural architecture search (NAS) and is capable of implicitly optimizing the bitrate without the use of an entropy estimator. Our experiments show that by introducing alternating direction method of multipliers (ADMM) to the model pipeline, the pruning paradigm yields more accurate results (SSIM/MS-SSIM-wise) when compared to entropy-based approaches and that of traditional codecs (JPEG, JPEG 2000, etc.) while maintaining acceptable inference speed. We further explore the effectiveness of the pruning method in CAE-ADMM by examining the generated latent codes.

* 5 pages, 4 figures, 2 tables, 1 algorithm

Via

Access Paper or Ask Questions

Full Workspace Generation of Serial-link Manipulators by Deep Learning based Jacobian Estimation

Sep 14, 2018

Peiyuan Liao, Jiajun Mao

Figure 1 for Full Workspace Generation of Serial-link Manipulators by Deep Learning based Jacobian Estimation

Figure 2 for Full Workspace Generation of Serial-link Manipulators by Deep Learning based Jacobian Estimation

Figure 3 for Full Workspace Generation of Serial-link Manipulators by Deep Learning based Jacobian Estimation

Figure 4 for Full Workspace Generation of Serial-link Manipulators by Deep Learning based Jacobian Estimation

Abstract:Apart from solving complicated problems that require a certain level of intelligence, fine-tuned deep neural networks can also create fast algorithms for slow, numerical tasks. In this paper, we introduce an improved version of [1]'s work, a fast, deep-learning framework capable of generating the full workspace of serial-link manipulators. The architecture consists of two neural networks: an estimation net that approximates the manipulator Jacobian, and a confidence net that measures the confidence of the approximation. We also introduce M3 (Manipulability Maps of Manipulators), a MATLAB robotics library based on [2](RTB), the datasets generated by which are used by this work. Results have shown that not only are the neural networks significantly faster than numerical inverse kinematics, it also offers superior accuracy when compared to other machine learning alternatives. Implementations of the algorithm (based on Keras[3]), including benchmark evaluation script, are available at https://github.com/liaopeiyuan/Jacobian-Estimation . The M3 Library APIs and datasets are also available at https://github.com/liaopeiyuan/M3 .

* 10 pages, 12 figures

Via

Access Paper or Ask Questions