Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tong Lin

Peking University

Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Model

Dec 02, 2024

Qianhan Feng, Wenshuo Li, Tong Lin, Xinghao Chen

Abstract:Vision-Language Models (VLMs) bring powerful understanding and reasoning capabilities to multimodal tasks. Meanwhile, the great need for capable aritificial intelligence on mobile devices also arises, such as the AI assistant software. Some efforts try to migrate VLMs to edge devices to expand their application scope. Simplifying the model structure is a common method, but as the model shrinks, the trade-off between performance and size becomes more and more difficult. Knowledge distillation (KD) can help models improve comprehensive capabilities without increasing size or data volume. However, most of the existing large model distillation techniques only consider applications on single-modal LLMs, or only use teachers to create new data environments for students. None of these methods take into account the distillation of the most important cross-modal alignment knowledge in VLMs. We propose a method called Align-KD to guide the student model to learn the cross-modal matching that occurs at the shallow layer. The teacher also helps student learn the projection of vision token into text embedding space based on the focus of text. Under the guidance of Align-KD, the 1.7B MobileVLM V2 model can learn rich knowledge from the 7B teacher model with light design of training loss, and achieve an average score improvement of 2.0 across 6 benchmarks under two training subsets respectively. Code is available at: https://github.com/fqhank/Align-KD.

Via

Access Paper or Ask Questions

Full-Stage Pseudo Label Quality Enhancement for Weakly-supervised Temporal Action Localization

Jul 12, 2024

Qianhan Feng, Wenshuo Li, Tong Lin, Xinghao Chen

Abstract:Weakly-supervised Temporal Action Localization (WSTAL) aims to localize actions in untrimmed videos using only video-level supervision. Latest WSTAL methods introduce pseudo label learning framework to bridge the gap between classification-based training and inferencing targets at localization, and achieve cutting-edge results. In these frameworks, a classification-based model is used to generate pseudo labels for a regression-based student model to learn from. However, the quality of pseudo labels in the framework, which is a key factor to the final result, is not carefully studied. In this paper, we propose a set of simple yet efficient pseudo label quality enhancement mechanisms to build our FuSTAL framework. FuSTAL enhances pseudo label quality at three stages: cross-video contrastive learning at proposal Generation-Stage, prior-based filtering at proposal Selection-Stage and EMA-based distillation at Training-Stage. These designs enhance pseudo label quality at different stages in the framework, and help produce more informative, less false and smoother action proposals. With the help of these comprehensive designs at all stages, FuSTAL achieves an average mAP of 50.8% on THUMOS'14, outperforming the previous best method by 1.2%, and becomes the first method to reach the milestone of 50%.

Via

Access Paper or Ask Questions

VCC-INFUSE: Towards Accurate and Efficient Selection of Unlabeled Examples in Semi-supervised Learning

Apr 18, 2024

Shijie Fang, Qianhan Feng, Tong Lin

Figure 1 for VCC-INFUSE: Towards Accurate and Efficient Selection of Unlabeled Examples in Semi-supervised Learning

Figure 2 for VCC-INFUSE: Towards Accurate and Efficient Selection of Unlabeled Examples in Semi-supervised Learning

Figure 3 for VCC-INFUSE: Towards Accurate and Efficient Selection of Unlabeled Examples in Semi-supervised Learning

Figure 4 for VCC-INFUSE: Towards Accurate and Efficient Selection of Unlabeled Examples in Semi-supervised Learning

Abstract:Despite the progress of Semi-supervised Learning (SSL), existing methods fail to utilize unlabeled data effectively and efficiently. Many pseudo-label-based methods select unlabeled examples based on inaccurate confidence scores from the classifier. Most prior work also uses all available unlabeled data without pruning, making it difficult to handle large amounts of unlabeled data. To address these issues, we propose two methods: Variational Confidence Calibration (VCC) and Influence-Function-based Unlabeled Sample Elimination (INFUSE). VCC is an universal plugin for SSL confidence calibration, using a variational autoencoder to select more accurate pseudo labels based on three types of consistency scores. INFUSE is a data pruning method that constructs a core dataset of unlabeled examples under SSL. Our methods are effective in multiple datasets and settings, reducing classification errors rates and saving training time. Together, VCC-INFUSE reduces the error rate of FlexMatch on the CIFAR-100 dataset by 1.08% while saving nearly half of the training time.

* Accepted paper of IJCAI 2024. Shijie Fang and Qianhan Feng contributed equally to this paper

Via

Access Paper or Ask Questions

Target specific peptide design using latent space approximate trajectory collector

Feb 02, 2023

Tong Lin, Sijie Chen, Ruchira Basu, Dehu Pei, Xiaolin Cheng, Levent Burak Kara

Abstract:Despite the prevalence and many successes of deep learning applications in de novo molecular design, the problem of peptide generation targeting specific proteins remains unsolved. A main barrier for this is the scarcity of the high-quality training data. To tackle the issue, we propose a novel machine learning based peptide design architecture, called Latent Space Approximate Trajectory Collector (LSATC). It consists of a series of samplers on an optimization trajectory on a highly non-convex energy landscape that approximates the distributions of peptides with desired properties in a latent space. The process involves little human intervention and can be implemented in an end-to-end manner. We demonstrate the model by the design of peptide extensions targeting Beta-catenin, a key nuclear effector protein involved in canonical Wnt signalling. When compared with a random sampler, LSATC can sample peptides with $36\%$ lower binding scores in a $16$ times smaller interquartile range (IQR) and $284\%$ less hydrophobicity with a $1.4$ times smaller IQR. LSATC also largely outperforms other common generative models. Finally, we utilized a clustering algorithm to select 4 peptides from the 100 LSATC designed peptides for experimental validation. The result confirms that all the four peptides extended by LSATC show improved Beta-catenin binding by at least $20.0\%$, and two of the peptides show a $3$ fold increase in binding affinity as compared to the base peptide.

Via

Access Paper or Ask Questions

Efficient Meta-Learning for Continual Learning with Taylor Expansion Approximation

Oct 03, 2022

Xiaohan Zou, Tong Lin

Figure 1 for Efficient Meta-Learning for Continual Learning with Taylor Expansion Approximation

Figure 2 for Efficient Meta-Learning for Continual Learning with Taylor Expansion Approximation

Figure 3 for Efficient Meta-Learning for Continual Learning with Taylor Expansion Approximation

Figure 4 for Efficient Meta-Learning for Continual Learning with Taylor Expansion Approximation

Abstract:Continual learning aims to alleviate catastrophic forgetting when handling consecutive tasks under non-stationary distributions. Gradient-based meta-learning algorithms have shown the capability to implicitly solve the transfer-interference trade-off problem between different examples. However, they still suffer from the catastrophic forgetting problem in the setting of continual learning, since the past data of previous tasks are no longer available. In this work, we propose a novel efficient meta-learning algorithm for solving the online continual learning problem, where the regularization terms and learning rates are adapted to the Taylor approximation of the parameter's importance to mitigate forgetting. The proposed method expresses the gradient of the meta-loss in closed-form and thus avoid computing second-order derivative which is computationally inhibitable. We also use Proximal Gradient Descent to further improve computational efficiency and accuracy. Experiments on diverse benchmarks show that our method achieves better or on-par performance and much higher efficiency compared to the state-of-the-art approaches.

* Accepted by the 2022 International Joint Conference on Neural Networks (IJCNN 2022)

Via

Access Paper or Ask Questions

Unsupervised Domain Adaptation with Histogram-gated Image Translation for Delayered IC Image Analysis

Sep 27, 2022

Yee-Yang Tee, Deruo Cheng, Chye-Soon Chee, Tong Lin, Yiqiong Shi, Bah-Hwee Gwee

Figure 1 for Unsupervised Domain Adaptation with Histogram-gated Image Translation for Delayered IC Image Analysis

Figure 2 for Unsupervised Domain Adaptation with Histogram-gated Image Translation for Delayered IC Image Analysis

Figure 3 for Unsupervised Domain Adaptation with Histogram-gated Image Translation for Delayered IC Image Analysis

Figure 4 for Unsupervised Domain Adaptation with Histogram-gated Image Translation for Delayered IC Image Analysis

Abstract:Deep learning has achieved great success in the challenging circuit annotation task by employing Convolutional Neural Networks (CNN) for the segmentation of circuit structures. The deep learning approaches require a large amount of manually annotated training data to achieve a good performance, which could cause a degradation in performance if a deep learning model trained on a given dataset is applied to a different dataset. This is commonly known as the domain shift problem for circuit annotation, which stems from the possibly large variations in distribution across different image datasets. The different image datasets could be obtained from different devices or different layers within a single device. To address the domain shift problem, we propose Histogram-gated Image Translation (HGIT), an unsupervised domain adaptation framework which transforms images from a given source dataset to the domain of a target dataset, and utilize the transformed images for training a segmentation network. Specifically, our HGIT performs generative adversarial network (GAN)-based image translation and utilizes histogram statistics for data curation. Experiments were conducted on a single labeled source dataset adapted to three different target datasets (without labels for training) and the segmentation performance was evaluated for each target dataset. We have demonstrated that our method achieves the best performance compared to the reported domain adaptation techniques, and is also reasonably close to the fully supervised benchmark.

* 7 pages, 4 figures, To be presented at IEEE PAINE 2022 (oral)

Via

Access Paper or Ask Questions

Regularizing Deep Neural Networks with Stochastic Estimators of Hessian Trace

Aug 11, 2022

Yucong Liu, Shixing Yu, Tong Lin

Figure 1 for Regularizing Deep Neural Networks with Stochastic Estimators of Hessian Trace

Figure 2 for Regularizing Deep Neural Networks with Stochastic Estimators of Hessian Trace

Figure 3 for Regularizing Deep Neural Networks with Stochastic Estimators of Hessian Trace

Figure 4 for Regularizing Deep Neural Networks with Stochastic Estimators of Hessian Trace

Abstract:In this paper we develop a novel regularization method for deep neural networks by penalizing the trace of Hessian. This regularizer is motivated by a recent guarantee bound of the generalization error. Hutchinson method is a classical unbiased estimator for the trace of a matrix, but it is very time-consuming on deep learning models. Hence a dropout scheme is proposed to efficiently implements the Hutchinson method. Then we discuss a connection to linear stability of a nonlinear dynamical system and flat/sharp minima. Experiments demonstrate that our method outperforms existing regularizers and data augmentation methods, such as Jacobian, confidence penalty, and label smoothing, cutout and mixup.

Via

Access Paper or Ask Questions

Boosting Image Outpainting with Semantic Layout Prediction

Oct 18, 2021

Ye Ma, Jin Ma, Min Zhou, Quan Chen, Tiezheng Ge, Yuning Jiang, Tong Lin

Figure 1 for Boosting Image Outpainting with Semantic Layout Prediction

Figure 2 for Boosting Image Outpainting with Semantic Layout Prediction

Figure 3 for Boosting Image Outpainting with Semantic Layout Prediction

Figure 4 for Boosting Image Outpainting with Semantic Layout Prediction

Abstract:The objective of image outpainting is to extend image current border and generate new regions based on known ones. Previous methods adopt generative adversarial networks (GANs) to synthesize realistic images. However, the lack of explicit semantic representation leads to blurry and abnormal image pixels when the outpainting areas are complex and with various objects. In this work, we decompose the outpainting task into two stages. Firstly, we train a GAN to extend regions in semantic segmentation domain instead of image domain. Secondly, another GAN model is trained to synthesize real images based on the extended semantic layouts. The first model focuses on low frequent context such as sizes, classes and other semantic cues while the second model focuses on high frequent context like color and texture. By this design, our approach can handle semantic clues more easily and hence works better in complex scenarios. We evaluate our framework on various datasets and make quantitative and qualitative analysis. Experiments demonstrate that our method generates reasonable extended semantic layouts and images, outperforming state-of-the-art models.

Via

Access Paper or Ask Questions

Principal Gradient Direction and Confidence Reservoir Sampling for Continual Learning

Aug 21, 2021

Zhiyi Chen, Tong Lin

Figure 1 for Principal Gradient Direction and Confidence Reservoir Sampling for Continual Learning

Figure 2 for Principal Gradient Direction and Confidence Reservoir Sampling for Continual Learning

Figure 3 for Principal Gradient Direction and Confidence Reservoir Sampling for Continual Learning

Figure 4 for Principal Gradient Direction and Confidence Reservoir Sampling for Continual Learning

Abstract:Task-free online continual learning aims to alleviate catastrophic forgetting of the learner on a non-iid data stream. Experience Replay (ER) is a SOTA continual learning method, which is broadly used as the backbone algorithm for other replay-based methods. However, the training strategy of ER is too simple to take full advantage of replayed examples and its reservoir sampling strategy is also suboptimal. In this work, we propose a general proximal gradient framework so that ER can be viewed as a special case. We further propose two improvements accordingly: Principal Gradient Direction (PGD) and Confidence Reservoir Sampling (CRS). In Principal Gradient Direction, we optimize a target gradient that not only represents the major contribution of past gradients, but also retains the new knowledge of the current gradient. We then present Confidence Reservoir Sampling for maintaining a more informative memory buffer based on a margin-based metric that measures the value of stored examples. Experiments substantiate the effectiveness of both our improvements and our new algorithm consistently boosts the performance of MIR-replay, a SOTA ER-based method: our algorithm increases the average accuracy up to 7.9% and reduces forgetting up to 15.4% on four datasets.

Via

Access Paper or Ask Questions

A New Adaptive Gradient Method with Gradient Decomposition

Jul 18, 2021

Zhou Shao, Tong Lin

Figure 1 for A New Adaptive Gradient Method with Gradient Decomposition

Figure 2 for A New Adaptive Gradient Method with Gradient Decomposition

Figure 3 for A New Adaptive Gradient Method with Gradient Decomposition

Figure 4 for A New Adaptive Gradient Method with Gradient Decomposition

Abstract:Adaptive gradient methods, especially Adam-type methods (such as Adam, AMSGrad, and AdaBound), have been proposed to speed up the training process with an element-wise scaling term on learning rates. However, they often generalize poorly compared with stochastic gradient descent (SGD) and its accelerated schemes such as SGD with momentum (SGDM). In this paper, we propose a new adaptive method called DecGD, which simultaneously achieves good generalization like SGDM and obtain rapid convergence like Adam-type methods. In particular, DecGD decomposes the current gradient into the product of two terms including a surrogate gradient and a loss based vector. Our method adjusts the learning rates adaptively according to the current loss based vector instead of the squared gradients used in Adam-type methods. The intuition for adaptive learning rates of DecGD is that a good optimizer, in general cases, needs to decrease the learning rates as the loss decreases, which is similar to the learning rates decay scheduling technique. Therefore, DecGD gets a rapid convergence in the early phases of training and controls the effective learning rates according to the loss based vectors which help lead to a better generalization. Convergence analysis is discussed in both convex and non-convex situations. Finally, empirical results on widely-used tasks and models demonstrate that DecGD shows better generalization performance than SGDM and rapid convergence like Adam-type methods.

Via

Access Paper or Ask Questions