Picture for Ding Zhou

Ding Zhou

Distributed Sign Momentum with Local Steps for Training Transformers

Add code
Nov 26, 2024
Viaarxiv icon

MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router

Add code
Oct 15, 2024
Figure 1 for MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router
Figure 2 for MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router
Figure 3 for MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router
Figure 4 for MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router
Viaarxiv icon

MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs

Add code
Feb 23, 2024
Figure 1 for MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Figure 2 for MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Figure 3 for MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Figure 4 for MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Viaarxiv icon

Video-CSR: Complex Video Digest Creation for Visual-Language Models

Add code
Oct 08, 2023
Figure 1 for Video-CSR: Complex Video Digest Creation for Visual-Language Models
Figure 2 for Video-CSR: Complex Video Digest Creation for Visual-Language Models
Figure 3 for Video-CSR: Complex Video Digest Creation for Visual-Language Models
Figure 4 for Video-CSR: Complex Video Digest Creation for Visual-Language Models
Viaarxiv icon

Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens

Add code
Mar 27, 2023
Figure 1 for Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens
Figure 2 for Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens
Figure 3 for Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens
Figure 4 for Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens
Viaarxiv icon

Learning identifiable and interpretable latent models of high-dimensional neural activity using pi-VAE

Add code
Nov 09, 2020
Figure 1 for Learning identifiable and interpretable latent models of high-dimensional neural activity using pi-VAE
Figure 2 for Learning identifiable and interpretable latent models of high-dimensional neural activity using pi-VAE
Figure 3 for Learning identifiable and interpretable latent models of high-dimensional neural activity using pi-VAE
Figure 4 for Learning identifiable and interpretable latent models of high-dimensional neural activity using pi-VAE
Viaarxiv icon

A zero-inflated gamma model for deconvolved calcium imaging traces

Add code
Jun 05, 2020
Viaarxiv icon

Disentangled sticky hierarchical Dirichlet process hidden Markov model

Add code
Apr 06, 2020
Figure 1 for Disentangled sticky hierarchical Dirichlet process hidden Markov model
Figure 2 for Disentangled sticky hierarchical Dirichlet process hidden Markov model
Figure 3 for Disentangled sticky hierarchical Dirichlet process hidden Markov model
Figure 4 for Disentangled sticky hierarchical Dirichlet process hidden Markov model
Viaarxiv icon

Penalized matrix decomposition for denoising, compression, and improved demixing of functional imaging data

Add code
Jul 17, 2018
Figure 1 for Penalized matrix decomposition for denoising, compression, and improved demixing of functional imaging data
Figure 2 for Penalized matrix decomposition for denoising, compression, and improved demixing of functional imaging data
Figure 3 for Penalized matrix decomposition for denoising, compression, and improved demixing of functional imaging data
Figure 4 for Penalized matrix decomposition for denoising, compression, and improved demixing of functional imaging data
Viaarxiv icon