Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mengya Gao

Piccolo2: General Text Embedding with Multi-task Hybrid Loss Training

May 11, 2024

Junqin Huang, Zhongjie Hu, Zihao Jing, Mengya Gao, Yichao Wu

Figure 1 for Piccolo2: General Text Embedding with Multi-task Hybrid Loss Training

Figure 2 for Piccolo2: General Text Embedding with Multi-task Hybrid Loss Training

Figure 3 for Piccolo2: General Text Embedding with Multi-task Hybrid Loss Training

Figure 4 for Piccolo2: General Text Embedding with Multi-task Hybrid Loss Training

Abstract:In this report, we introduce Piccolo2, an embedding model that surpasses other models in the comprehensive evaluation over 6 tasks on CMTEB benchmark, setting a new state-of-the-art. Piccolo2 primarily leverages an efficient multi-task hybrid loss training approach, effectively harnessing textual data and labels from diverse downstream tasks. In addition, Piccolo2 scales up the embedding dimension and uses MRL training to support more flexible vector dimensions. The latest information of piccolo models can be accessed via: https://huggingface.co/sensenova/

* tech report

Via

Access Paper or Ask Questions

One to Transfer All: A Universal Transfer Framework for Vision Foundation Model with Few Data

Nov 24, 2021

Yujie Wang, Junqin Huang, Mengya Gao, Yichao Wu, Zhenfei Yin, Ding Liang, Junjie Yan

Figure 1 for One to Transfer All: A Universal Transfer Framework for Vision Foundation Model with Few Data

Figure 2 for One to Transfer All: A Universal Transfer Framework for Vision Foundation Model with Few Data

Figure 3 for One to Transfer All: A Universal Transfer Framework for Vision Foundation Model with Few Data

Figure 4 for One to Transfer All: A Universal Transfer Framework for Vision Foundation Model with Few Data

Abstract:The foundation model is not the last chapter of the model production pipeline. Transferring with few data in a general way to thousands of downstream tasks is becoming a trend of the foundation model's application. In this paper, we proposed a universal transfer framework: One to Transfer All (OTA) to transfer any Vision Foundation Model (VFM) to any downstream tasks with few downstream data. We first transfer a VFM to a task-specific model by Image Re-representation Fine-tuning (IRF) then distilling knowledge from a task-specific model to a deployed model with data produced by Downstream Image-Guided Generation (DIGG). OTA has no dependency on upstream data, VFM, and downstream tasks when transferring. It also provides a way for VFM researchers to release their upstream information for better transferring but not leaking data due to privacy requirements. Massive experiments validate the effectiveness and superiority of our methods in few data setting. Our code will be released.

* Technical Report

Via

Access Paper or Ask Questions

INTERN: A New Learning Paradigm Towards General Vision

Nov 16, 2021

Jing Shao, Siyu Chen, Yangguang Li, Kun Wang, Zhenfei Yin, Yinan He, Jianing Teng, Qinghong Sun, Mengya Gao, Jihao Liu(+17 more)

Figure 1 for INTERN: A New Learning Paradigm Towards General Vision

Figure 2 for INTERN: A New Learning Paradigm Towards General Vision

Figure 3 for INTERN: A New Learning Paradigm Towards General Vision

Figure 4 for INTERN: A New Learning Paradigm Towards General Vision

Abstract:Enormous waves of technological innovations over the past several years, marked by the advances in AI technologies, are profoundly reshaping the industry and the society. However, down the road, a key challenge awaits us, that is, our capability of meeting rapidly-growing scenario-specific demands is severely limited by the cost of acquiring a commensurate amount of training data. This difficult situation is in essence due to limitations of the mainstream learning paradigm: we need to train a new model for each new scenario, based on a large quantity of well-annotated data and commonly from scratch. In tackling this fundamental problem, we move beyond and develop a new learning paradigm named INTERN. By learning with supervisory signals from multiple sources in multiple stages, the model being trained will develop strong generalizability. We evaluate our model on 26 well-known datasets that cover four categories of tasks in computer vision. In most cases, our models, adapted with only 10% of the training data in the target domain, outperform the counterparts trained with the full set of data, often by a significant margin. This is an important step towards a promising prospect where such a model with general vision capability can dramatically reduce our reliance on data, thus expediting the adoption of AI technologies. Furthermore, revolving around our new paradigm, we also introduce a new data system, a new architecture, and a new benchmark, which, together, form a general vision ecosystem to support its future development in an open and inclusive manner.

Via

Access Paper or Ask Questions

Fixing the Teacher-Student Knowledge Discrepancy in Distillation

Mar 31, 2021

Jiangfan Han, Mengya Gao, Yujie Wang, Quanquan Li, Hongsheng Li, Xiaogang Wang

Figure 1 for Fixing the Teacher-Student Knowledge Discrepancy in Distillation

Figure 2 for Fixing the Teacher-Student Knowledge Discrepancy in Distillation

Figure 3 for Fixing the Teacher-Student Knowledge Discrepancy in Distillation

Figure 4 for Fixing the Teacher-Student Knowledge Discrepancy in Distillation

Abstract:Training a small student network with the guidance of a larger teacher network is an effective way to promote the performance of the student. Despite the different types, the guided knowledge used to distill is always kept unchanged for different teacher and student pairs in previous knowledge distillation methods. However, we find that teacher and student models with different networks or trained from different initialization could have distinct feature representations among different channels. (e.g. the high activated channel for different categories). We name this incongruous representation of channels as teacher-student knowledge discrepancy in the distillation process. Ignoring the knowledge discrepancy problem of teacher and student models will make the learning of student from teacher more difficult. To solve this problem, in this paper, we propose a novel student-dependent distillation method, knowledge consistent distillation, which makes teacher's knowledge more consistent with the student and provides the best suitable knowledge to different student networks for distillation. Extensive experiments on different datasets (CIFAR100, ImageNet, COCO) and tasks (image classification, object detection) reveal the widely existing knowledge discrepancy problem between teachers and students and demonstrate the effectiveness of our proposed method. Our method is very flexible that can be easily combined with other state-of-the-art approaches.

Via

Access Paper or Ask Questions

Residual Knowledge Distillation

Feb 21, 2020

Mengya Gao, Yujun Shen, Quanquan Li, Chen Change Loy

Figure 1 for Residual Knowledge Distillation

Figure 2 for Residual Knowledge Distillation

Figure 3 for Residual Knowledge Distillation

Figure 4 for Residual Knowledge Distillation

Abstract:Knowledge distillation (KD) is one of the most potent ways for model compression. The key idea is to transfer the knowledge from a deep teacher model (T) to a shallower student (S). However, existing methods suffer from performance degradation due to the substantial gap between the learning capacities of S and T. To remedy this problem, this work proposes Residual Knowledge Distillation (RKD), which further distills the knowledge by introducing an assistant (A). Specifically, S is trained to mimic the feature maps of T, and A aids this process by learning the residual error between them. In this way, S and A complement with each other to get better knowledge from T. Furthermore, we devise an effective method to derive S and A from a given model without increasing the total computational cost. Extensive experiments show that our approach achieves appealing results on popular classification datasets, CIFAR-100 and ImageNet, surpassing state-of-the-art methods.

* 9 pages, 3 figures, 3 tables

Via

Access Paper or Ask Questions

P2SGrad: Refined Gradients for Optimizing Deep Face Models

May 07, 2019

Xiao Zhang, Rui Zhao, Junjie Yan, Mengya Gao, Yu Qiao, Xiaogang Wang, Hongsheng Li

Figure 1 for P2SGrad: Refined Gradients for Optimizing Deep Face Models

Figure 2 for P2SGrad: Refined Gradients for Optimizing Deep Face Models

Figure 3 for P2SGrad: Refined Gradients for Optimizing Deep Face Models

Figure 4 for P2SGrad: Refined Gradients for Optimizing Deep Face Models

Abstract:Cosine-based softmax losses significantly improve the performance of deep face recognition networks. However, these losses always include sensitive hyper-parameters which can make training process unstable, and it is very tricky to set suitable hyper parameters for a specific dataset. This paper addresses this challenge by directly designing the gradients for adaptively training deep neural networks. We first investigate and unify previous cosine softmax losses by analyzing their gradients. This unified view inspires us to propose a novel gradient called P2SGrad (Probability-to-Similarity Gradient), which leverages a cosine similarity instead of classification probability to directly update the testing metrics for updating neural network parameters. P2SGrad is adaptive and hyper-parameter free, which makes the training process more efficient and faster. We evaluate our P2SGrad on three face recognition benchmarks, LFW, MegaFace, and IJB-C. The results show that P2SGrad is stable in training, robust to noise, and achieves state-of-the-art performance on all the three benchmarks.

* Accepted by CVPR 2019

Via

Access Paper or Ask Questions

Feature Matters: A Stage-by-Stage Approach for Knowledge Transfer

Dec 05, 2018

Mengya Gao, Yujun Shen, Quanquan Li, Chen Change Loy, Xiaoou Tang

Figure 1 for Feature Matters: A Stage-by-Stage Approach for Knowledge Transfer

Figure 2 for Feature Matters: A Stage-by-Stage Approach for Knowledge Transfer

Figure 3 for Feature Matters: A Stage-by-Stage Approach for Knowledge Transfer

Figure 4 for Feature Matters: A Stage-by-Stage Approach for Knowledge Transfer

Abstract:Convolutional Neural Networks (CNNs) become deeper and deeper in recent years, making the study of model acceleration imperative. It is a common practice to employ a shallow network, called student, to learn from a deep one, which is termed as teacher. Prior work made many attempts to transfer different types of knowledge from teacher to student, however, there are two problems remaining unsolved. Firstly, the knowledge used by existing methods is usually manually defined, which may not be consistent with the information learned by the original model. Secondly, there lacks an effective training scheme for the transfer process, leading to degradation of performance. In this work, we argue that feature is the most important knowledge from teacher. It is sufficient for student to achieve appealing performance by just learning similar features as teacher without any processing. Based on this discovery, we further present an efficient learning strategy, which is to make student mimic features of teacher stage by stage. Extensive experiments suggest that the proposed approach significantly narrows down the gap between student and teacher, and shows strong stability on various tasks, ie classification and detection, outperforming the state-of-the-art methods.

* 9 pages and 3 figures

Via

Access Paper or Ask Questions

A Novel Hybrid Machine Learning Model for Auto-Classification of Retinal Diseases

Jun 17, 2018

C. -H. Huck Yang, Jia-Hong Huang, Fangyu Liu, Fang-Yi Chiu, Mengya Gao, Weifeng Lyu, I-Hung Lin M. D., Jesper Tegner

Figure 1 for A Novel Hybrid Machine Learning Model for Auto-Classification of Retinal Diseases

Figure 2 for A Novel Hybrid Machine Learning Model for Auto-Classification of Retinal Diseases

Figure 3 for A Novel Hybrid Machine Learning Model for Auto-Classification of Retinal Diseases

Figure 4 for A Novel Hybrid Machine Learning Model for Auto-Classification of Retinal Diseases

Abstract:Automatic clinical diagnosis of retinal diseases has emerged as a promising approach to facilitate discovery in areas with limited access to specialists. We propose a novel visual-assisted diagnosis hybrid model based on the support vector machine (SVM) and deep neural networks (DNNs). The model incorporates complementary strengths of DNNs and SVM. Furthermore, we present a new clinical retina label collection for ophthalmology incorporating 32 retina diseases classes. Using EyeNet, our model achieves 89.73% diagnosis accuracy and the model performance is comparable to the professional ophthalmologists.

* ICML-IJCAI Workshop 2018
* Accepted at the Joint ICML and IJCAI Workshop on Computational Biology (ICML-IJCAI WCB) to be held in Stockholm SWEDEN, 2018. Referring to https://sites.google.com/view/wcb2018/accepted-papers?authuser=0

Via

Access Paper or Ask Questions