Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Qitao Shi

MoDE: A Mixture-of-Experts Model with Mutual Distillation among the Experts

Jan 31, 2024

Zhitian Xie, Yinger Zhang, Chenyi Zhuang, Qitao Shi, Zhining Liu, Jinjie Gu, Guannan Zhang

Abstract:The application of mixture-of-experts (MoE) is gaining popularity due to its ability to improve model's performance. In an MoE structure, the gate layer plays a significant role in distinguishing and routing input features to different experts. This enables each expert to specialize in processing their corresponding sub-tasks. However, the gate's routing mechanism also gives rise to narrow vision: the individual MoE's expert fails to use more samples in learning the allocated sub-task, which in turn limits the MoE to further improve its generalization ability. To effectively address this, we propose a method called Mixture-of-Distilled-Expert (MoDE), which applies moderate mutual distillation among experts to enable each expert to pick up more features learned by other experts and gain more accurate perceptions on their original allocated sub-tasks. We conduct plenty experiments including tabular, NLP and CV datasets, which shows MoDE's effectiveness, universality and robustness. Furthermore, we develop a parallel study through innovatively constructing "expert probing", to experimentally prove why MoDE works: moderate distilling knowledge can improve each individual expert's test performances on their assigned tasks, leading to MoE's overall performance improvement.

* Accepted by AAAI-24

Via

Access Paper or Ask Questions

ALT: An Automatic System for Long Tail Scenario Modeling

May 19, 2023

Ya-Lin Zhang, Jun Zhou, Yankun Ren, Yue Zhang, Xinxing Yang, Meng Li, Qitao Shi, Longfei Li

Figure 1 for ALT: An Automatic System for Long Tail Scenario Modeling

Figure 2 for ALT: An Automatic System for Long Tail Scenario Modeling

Figure 3 for ALT: An Automatic System for Long Tail Scenario Modeling

Figure 4 for ALT: An Automatic System for Long Tail Scenario Modeling

Abstract:In this paper, we consider the problem of long tail scenario modeling with budget limitation, i.e., insufficient human resources for model training stage and limited time and computing resources for model inference stage. This problem is widely encountered in various applications, yet has received deficient attention so far. We present an automatic system named ALT to deal with this problem. Several efforts are taken to improve the algorithms used in our system, such as employing various automatic machine learning related techniques, adopting the meta learning philosophy, and proposing an essential budget-limited neural architecture search method, etc. Moreover, to build the system, many optimizations are performed from a systematic perspective, and essential modules are armed, making the system more feasible and efficient. We perform abundant experiments to validate the effectiveness of our system and demonstrate the usefulness of the critical modules in our system. Moreover, online results are provided, which fully verified the efficacy of our system.

Via

Access Paper or Ask Questions

SAFE: Scalable Automatic Feature Engineering Framework for Industrial Tasks

Mar 09, 2020

Qitao Shi, Ya-Lin Zhang, Longfei Li, Xinxing Yang, Meng Li, Jun Zhou

Figure 1 for SAFE: Scalable Automatic Feature Engineering Framework for Industrial Tasks

Figure 2 for SAFE: Scalable Automatic Feature Engineering Framework for Industrial Tasks

Figure 3 for SAFE: Scalable Automatic Feature Engineering Framework for Industrial Tasks

Figure 4 for SAFE: Scalable Automatic Feature Engineering Framework for Industrial Tasks

Abstract:Machine learning techniques have been widely applied in Internet companies for various tasks, acting as an essential driving force, and feature engineering has been generally recognized as a crucial tache when constructing machine learning systems. Recently, a growing effort has been made to the development of automatic feature engineering methods, so that the substantial and tedious manual effort can be liberated. However, for industrial tasks, the efficiency and scalability of these methods are still far from satisfactory. In this paper, we proposed a staged method named SAFE (Scalable Automatic Feature Engineering), which can provide excellent efficiency and scalability, along with requisite interpretability and promising performance. Extensive experiments are conducted and the results show that the proposed method can provide prominent efficiency and competitive effectiveness when comparing with other methods. What's more, the adequate scalability of the proposed method ensures it to be deployed in large scale industrial tasks.

Via

Access Paper or Ask Questions