Picture for Liang Luo

Liang Luo

Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

Add code
Nov 07, 2024
Figure 1 for Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Figure 2 for Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Figure 3 for Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Figure 4 for Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Viaarxiv icon

MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts

Add code
Jul 31, 2024
Figure 1 for MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts
Figure 2 for MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts
Figure 3 for MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts
Figure 4 for MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts
Viaarxiv icon

Wukong: Towards a Scaling Law for Large-Scale Recommendation

Add code
Mar 08, 2024
Viaarxiv icon

Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large-Scale Recommendation

Add code
Mar 07, 2024
Figure 1 for Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large-Scale Recommendation
Figure 2 for Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large-Scale Recommendation
Figure 3 for Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large-Scale Recommendation
Figure 4 for Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large-Scale Recommendation
Viaarxiv icon

Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models

Add code
May 03, 2023
Viaarxiv icon

Self-discipline on multiple channels

Add code
Apr 27, 2023
Viaarxiv icon

PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel

Add code
Apr 21, 2023
Viaarxiv icon

DHEN: A Deep and Hierarchical Ensemble Network for Large-Scale Click-Through Rate Prediction

Add code
Mar 11, 2022
Figure 1 for DHEN: A Deep and Hierarchical Ensemble Network for Large-Scale Click-Through Rate Prediction
Figure 2 for DHEN: A Deep and Hierarchical Ensemble Network for Large-Scale Click-Through Rate Prediction
Figure 3 for DHEN: A Deep and Hierarchical Ensemble Network for Large-Scale Click-Through Rate Prediction
Figure 4 for DHEN: A Deep and Hierarchical Ensemble Network for Large-Scale Click-Through Rate Prediction
Viaarxiv icon

Characterizing and Taming Resolution in Convolutional Neural Networks

Add code
Oct 28, 2021
Figure 1 for Characterizing and Taming Resolution in Convolutional Neural Networks
Figure 2 for Characterizing and Taming Resolution in Convolutional Neural Networks
Figure 3 for Characterizing and Taming Resolution in Convolutional Neural Networks
Figure 4 for Characterizing and Taming Resolution in Convolutional Neural Networks
Viaarxiv icon

Cloud Collectives: Towards Cloud-aware Collectives forML Workloads with Rank Reordering

Add code
May 28, 2021
Figure 1 for Cloud Collectives: Towards Cloud-aware Collectives forML Workloads with Rank Reordering
Figure 2 for Cloud Collectives: Towards Cloud-aware Collectives forML Workloads with Rank Reordering
Figure 3 for Cloud Collectives: Towards Cloud-aware Collectives forML Workloads with Rank Reordering
Figure 4 for Cloud Collectives: Towards Cloud-aware Collectives forML Workloads with Rank Reordering
Viaarxiv icon