Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization

Feb 19, 2024

James Oldfield, Markos Georgopoulos, Grigorios G. Chrysos, Christos Tzelepis, Yannis Panagakis, Mihalis A. Nicolaou, Jiankang Deng, Ioannis Patras

Figure 1 for Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization

Figure 2 for Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization

Figure 3 for Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization

Figure 4 for Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization

Share this with someone who'll enjoy it:

Abstract:The Mixture of Experts (MoE) paradigm provides a powerful way to decompose inscrutable dense layers into smaller, modular computations often more amenable to human interpretation, debugging, and editability. A major problem however lies in the computational cost of scaling the number of experts to achieve sufficiently fine-grained specialization. In this paper, we propose the Multilinear Mixutre of Experts (MMoE) layer to address this, focusing on vision models. MMoE layers perform an implicit computation on prohibitively large weight tensors entirely in factorized form. Consequently, MMoEs both (1) avoid the issues incurred through the discrete expert routing in the popular 'sparse' MoE models, yet (2) do not incur the restrictively high inference-time costs of 'soft' MoE alternatives. We present both qualitative and quantitative evidence (through visualization and counterfactual interventions respectively) that scaling MMoE layers when fine-tuning foundation models for vision tasks leads to more specialized experts at the class-level whilst remaining competitive with the performance of parameter-matched linear layer counterparts. Finally, we show that learned expert specialism further facilitates manual correction of demographic bias in CelebA attribute classification. Our MMoE model code is available at https://github.com/james-oldfield/MMoE.

* Github: https://github.com/james-oldfield/MMoE. Project page: https://eecs.qmul.ac.uk/~jo001/MMoE/

View paper on

Share this with someone who'll enjoy it:

Title:Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization

Paper and Code