Picture for Michael Andersch

Michael Andersch

ScaleFold: Reducing AlphaFold Initial Training Time to 10 Hours

Add code
Apr 17, 2024
Viaarxiv icon

Reducing Activation Recomputation in Large Transformer Models

Add code
May 10, 2022
Figure 1 for Reducing Activation Recomputation in Large Transformer Models
Figure 2 for Reducing Activation Recomputation in Large Transformer Models
Figure 3 for Reducing Activation Recomputation in Large Transformer Models
Figure 4 for Reducing Activation Recomputation in Large Transformer Models
Viaarxiv icon

Sparse Persistent RNNs: Squeezing Large Recurrent Networks On-Chip

Add code
Apr 26, 2018
Figure 1 for Sparse Persistent RNNs: Squeezing Large Recurrent Networks On-Chip
Figure 2 for Sparse Persistent RNNs: Squeezing Large Recurrent Networks On-Chip
Figure 3 for Sparse Persistent RNNs: Squeezing Large Recurrent Networks On-Chip
Figure 4 for Sparse Persistent RNNs: Squeezing Large Recurrent Networks On-Chip
Viaarxiv icon