Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kaihua Tang

Enhancing Multimodal In-Context Learning for Image Classification through Coreset Optimization

Apr 19, 2025

Huiyi Chen, Jiawei Peng, Kaihua Tang, Xin Geng, Xu Yang

Abstract:In-context learning (ICL) enables Large Vision-Language Models (LVLMs) to adapt to new tasks without parameter updates, using a few demonstrations from a large support set. However, selecting informative demonstrations leads to high computational and memory costs. While some methods explore selecting a small and representative coreset in the text classification, evaluating all support set samples remains costly, and discarded samples lead to unnecessary information loss. These methods may also be less effective for image classification due to differences in feature spaces. Given these limitations, we propose Key-based Coreset Optimization (KeCO), a novel framework that leverages untapped data to construct a compact and informative coreset. We introduce visual features as keys within the coreset, which serve as the anchor for identifying samples to be updated through different selection strategies. By leveraging untapped samples from the support set, we update the keys of selected coreset samples, enabling the randomly initialized coreset to evolve into a more informative coreset under low computational cost. Through extensive experiments on coarse-grained and fine-grained image classification benchmarks, we demonstrate that KeCO effectively enhances ICL performance for image classification task, achieving an average improvement of more than 20\%. Notably, we evaluate KeCO under a simulated online scenario, and the strong performance in this scenario highlights the practical value of our framework for resource-constrained real-world scenarios.

* 11 pages, 5 figures

Via

Access Paper or Ask Questions

Generalized Logit Adjustment: Calibrating Fine-tuned Models by Removing Label Bias in Foundation Models

Oct 12, 2023

Beier Zhu, Kaihua Tang, Qianru Sun, Hanwang Zhang

Abstract:Foundation models like CLIP allow zero-shot transfer on various tasks without additional training data. Yet, the zero-shot performance is less competitive than a fully supervised one. Thus, to enhance the performance, fine-tuning and ensembling are also commonly adopted to better fit the downstream tasks. However, we argue that such prior work has overlooked the inherent biases in foundation models. Due to the highly imbalanced Web-scale training set, these foundation models are inevitably skewed toward frequent semantics, and thus the subsequent fine-tuning or ensembling is still biased. In this study, we systematically examine the biases in foundation models and demonstrate the efficacy of our proposed Generalized Logit Adjustment (GLA) method. Note that bias estimation in foundation models is challenging, as most pre-train data cannot be explicitly accessed like in traditional long-tailed classification tasks. To this end, GLA has an optimization-based bias estimation approach for debiasing foundation models. As our work resolves a fundamental flaw in the pre-training, the proposed GLA demonstrates significant improvements across a diverse range of tasks: it achieves 1.5 pp accuracy gains on ImageNet, an large average improvement (1.4-4.6 pp) on 11 few-shot datasets, 2.4 pp gains on long-tailed classification. Codes are in \url{https://github.com/BeierZhu/GLA}.

* Accepted by NeurIPS2023

Via

Access Paper or Ask Questions

Class Is Invariant to Context and Vice Versa: On Learning Invariance for Out-Of-Distribution Generalization

Aug 06, 2022

Jiaxin Qi, Kaihua Tang, Qianru Sun, Xian-Sheng Hua, Hanwang Zhang

Figure 1 for Class Is Invariant to Context and Vice Versa: On Learning Invariance for Out-Of-Distribution Generalization

Figure 2 for Class Is Invariant to Context and Vice Versa: On Learning Invariance for Out-Of-Distribution Generalization

Figure 3 for Class Is Invariant to Context and Vice Versa: On Learning Invariance for Out-Of-Distribution Generalization

Figure 4 for Class Is Invariant to Context and Vice Versa: On Learning Invariance for Out-Of-Distribution Generalization

Abstract:Out-Of-Distribution generalization (OOD) is all about learning invariance against environmental changes. If the context in every class is evenly distributed, OOD would be trivial because the context can be easily removed due to an underlying principle: class is invariant to context. However, collecting such a balanced dataset is impractical. Learning on imbalanced data makes the model bias to context and thus hurts OOD. Therefore, the key to OOD is context balance. We argue that the widely adopted assumption in prior work, the context bias can be directly annotated or estimated from biased class prediction, renders the context incomplete or even incorrect. In contrast, we point out the everoverlooked other side of the above principle: context is also invariant to class, which motivates us to consider the classes (which are already labeled) as the varying environments to resolve context bias (without context labels). We implement this idea by minimizing the contrastive loss of intra-class sample similarity while assuring this similarity to be invariant across all classes. On benchmarks with various context biases and domain gaps, we show that a simple re-weighting based classifier equipped with our context estimation achieves state-of-the-art performance. We provide the theoretical justifications in Appendix and codes on https://github.com/simpleshinobu/IRMCon.

Via

Access Paper or Ask Questions

Identifying Hard Noise in Long-Tailed Sample Distribution

Jul 27, 2022

Xuanyu Yi, Kaihua Tang, Xian-Sheng Hua, Joo-Hwee Lim, Hanwang Zhang

Figure 1 for Identifying Hard Noise in Long-Tailed Sample Distribution

Figure 2 for Identifying Hard Noise in Long-Tailed Sample Distribution

Figure 3 for Identifying Hard Noise in Long-Tailed Sample Distribution

Figure 4 for Identifying Hard Noise in Long-Tailed Sample Distribution

Abstract:Conventional de-noising methods rely on the assumption that all samples are independent and identically distributed, so the resultant classifier, though disturbed by noise, can still easily identify the noises as the outliers of training distribution. However, the assumption is unrealistic in large-scale data that is inevitably long-tailed. Such imbalanced training data makes a classifier less discriminative for the tail classes, whose previously "easy" noises are now turned into "hard" ones -- they are almost as outliers as the clean tail samples. We introduce this new challenge as Noisy Long-Tailed Classification (NLT). Not surprisingly, we find that most de-noising methods fail to identify the hard noises, resulting in significant performance drop on the three proposed NLT benchmarks: ImageNet-NLT, Animal10-NLT, and Food101-NLT. To this end, we design an iterative noisy learning framework called Hard-to-Easy (H2E). Our bootstrapping philosophy is to first learn a classifier as noise identifier invariant to the class and context distributional changes, reducing "hard" noises to "easy" ones, whose removal further improves the invariance. Experimental results show that our H2E outperforms state-of-the-art de-noising methods and their ablations on long-tailed settings while maintaining a stable performance on the conventional balanced settings. Datasets and codes are available at https://github.com/yxymessi/H2E-Framework

* Accepted to ECCV2022(Oral) ; Datasets and codes are available at https://github.com/yxymessi/H2E-Framework

Via

Access Paper or Ask Questions

Invariant Feature Learning for Generalized Long-Tailed Classification

Jul 19, 2022

Kaihua Tang, Mingyuan Tao, Jiaxin Qi, Zhenguang Liu, Hanwang Zhang

Figure 1 for Invariant Feature Learning for Generalized Long-Tailed Classification

Figure 2 for Invariant Feature Learning for Generalized Long-Tailed Classification

Figure 3 for Invariant Feature Learning for Generalized Long-Tailed Classification

Figure 4 for Invariant Feature Learning for Generalized Long-Tailed Classification

Abstract:Existing long-tailed classification (LT) methods only focus on tackling the class-wise imbalance that head classes have more samples than tail classes, but overlook the attribute-wise imbalance. In fact, even if the class is balanced, samples within each class may still be long-tailed due to the varying attributes. Note that the latter is fundamentally more ubiquitous and challenging than the former because attributes are not just implicit for most datasets, but also combinatorially complex, thus prohibitively expensive to be balanced. Therefore, we introduce a novel research problem: Generalized Long-Tailed classification (GLT), to jointly consider both kinds of imbalances. By "generalized", we mean that a GLT method should naturally solve the traditional LT, but not vice versa. Not surprisingly, we find that most class-wise LT methods degenerate in our proposed two benchmarks: ImageNet-GLT and MSCOCO-GLT. We argue that it is because they over-emphasize the adjustment of class distribution while neglecting to learn attribute-invariant features. To this end, we propose an Invariant Feature Learning (IFL) method as the first strong baseline for GLT. IFL first discovers environments with divergent intra-class distributions from the imperfect predictions and then learns invariant features across them. Promisingly, as an improved feature backbone, IFL boosts all the LT line-up: one/two-stage re-balance, augmentation, and ensemble. Codes and benchmarks are available on Github: https://github.com/KaihuaTang/Generalized-Long-Tailed-Benchmarks.pytorch

* Accepted to ECCV 2022. Codes and benchmarks are available on Github: https://github.com/KaihuaTang/Generalized-Long-Tailed-Benchmarks.pytorch

Via

Access Paper or Ask Questions

Adversarial Visual Robustness by Causal Intervention

Jun 17, 2021

Kaihua Tang, Mingyuan Tao, Hanwang Zhang

Figure 1 for Adversarial Visual Robustness by Causal Intervention

Figure 2 for Adversarial Visual Robustness by Causal Intervention

Figure 3 for Adversarial Visual Robustness by Causal Intervention

Figure 4 for Adversarial Visual Robustness by Causal Intervention

Abstract:Adversarial training is the de facto most promising defense against adversarial examples. Yet, its passive nature inevitably prevents it from being immune to unknown attackers. To achieve a proactive defense, we need a more fundamental understanding of adversarial examples, beyond the popular bounded threat model. In this paper, we provide a causal viewpoint of adversarial vulnerability: the cause is the confounder ubiquitously existing in learning, where attackers are precisely exploiting the confounding effect. Therefore, a fundamental solution for adversarial robustness is causal intervention. As the confounder is unobserved in general, we propose to use the instrumental variable that achieves intervention without the need for confounder observation. We term our robust training method as Causal intervention by instrumental Variable (CiiV). It has a differentiable retinotopic sampling layer and a consistency loss, which is stable and guaranteed not to suffer from gradient obfuscation. Extensive experiments on a wide spectrum of attackers and settings applied in MNIST, CIFAR-10, and mini-ImageNet datasets empirically demonstrate that CiiV is robust to adaptive attacks.

* Codes are available at https://github.com/KaihuaTang/Adversarial-Robustness-by-Causal-Intervention.pytorch

Via

Access Paper or Ask Questions

Distilling Causal Effect of Data in Class-Incremental Learning

Mar 08, 2021

Xinting Hu, Kaihua Tang, Chunyan Miao, Xian-Sheng Hua, Hanwang Zhang

Figure 1 for Distilling Causal Effect of Data in Class-Incremental Learning

Figure 2 for Distilling Causal Effect of Data in Class-Incremental Learning

Figure 3 for Distilling Causal Effect of Data in Class-Incremental Learning

Figure 4 for Distilling Causal Effect of Data in Class-Incremental Learning

Abstract:We propose a causal framework to explain the catastrophic forgetting in Class-Incremental Learning (CIL) and then derive a novel distillation method that is orthogonal to the existing anti-forgetting techniques, such as data replay and feature/label distillation. We first 1) place CIL into the framework, 2) answer why the forgetting happens: the causal effect of the old data is lost in new training, and then 3) explain how the existing techniques mitigate it: they bring the causal effect back. Based on the framework, we find that although the feature/label distillation is storage-efficient, its causal effect is not coherent with the end-to-end feature learning merit, which is however preserved by data replay. To this end, we propose to distill the Colliding Effect between the old and the new data, which is fundamentally equivalent to the causal effect of data replay, but without any cost of replay storage. Thanks to the causal effect analysis, we can further capture the Incremental Momentum Effect of the data stream, removing which can help to retain the old effect overwhelmed by the new data effect, and thus alleviate the forgetting of the old class in testing. Extensive experiments on three CIL benchmarks: CIFAR-100, ImageNet-Sub&Full, show that the proposed causal effect distillation can improve various state-of-the-art CIL methods by a large margin (0.72%--9.06%).

Via

Access Paper or Ask Questions

Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect

Oct 03, 2020

Kaihua Tang, Jianqiang Huang, Hanwang Zhang

Figure 1 for Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect

Figure 2 for Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect

Figure 3 for Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect

Figure 4 for Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect

Abstract:As the class size grows, maintaining a balanced dataset across many classes is challenging because the data are long-tailed in nature; it is even impossible when the sample-of-interest co-exists with each other in one collectable unit, e.g., multiple visual instances in one image. Therefore, long-tailed classification is the key to deep learning at scale. However, existing methods are mainly based on re-weighting/re-sampling heuristics that lack a fundamental theory. In this paper, we establish a causal inference framework, which not only unravels the whys of previous methods, but also derives a new principled solution. Specifically, our theory shows that the SGD momentum is essentially a confounder in long-tailed classification. On one hand, it has a harmful causal effect that misleads the tail prediction biased towards the head. On the other hand, its induced mediation also benefits the representation learning and head prediction. Our framework elegantly disentangles the paradoxical effects of the momentum, by pursuing the direct causal effect caused by an input sample. In particular, we use causal intervention in training, and counterfactual reasoning in inference, to remove the "bad" while keep the "good". We achieve new state-of-the-arts on three long-tailed visual recognition benchmarks: Long-tailed CIFAR-10/-100, ImageNet-LT for image classification and LVIS for instance segmentation.

* This paper is accepted by NeurIPS 2020. The code is available on GitHub: https://github.com/KaihuaTang/Long-Tailed-Recognition.pytorch

Via

Access Paper or Ask Questions

Counterfactual VQA: A Cause-Effect Look at Language Bias

Jun 15, 2020

Yulei Niu, Kaihua Tang, Hanwang Zhang, Zhiwu Lu, Xian-Sheng Hua, Ji-Rong Wen

Figure 1 for Counterfactual VQA: A Cause-Effect Look at Language Bias

Figure 2 for Counterfactual VQA: A Cause-Effect Look at Language Bias

Figure 3 for Counterfactual VQA: A Cause-Effect Look at Language Bias

Figure 4 for Counterfactual VQA: A Cause-Effect Look at Language Bias

Abstract:Visual Question Answering (VQA) models tend to rely on the language bias and thus fail to learn the reasoning from visual knowledge, which is however the original intention of VQA. In this paper, we propose a novel cause-effect look at the language bias, where the bias is formulated as the direct effect of question on answer from the view of causal inference. The effect can be captured by counterfactual VQA, where the image had not existed in an imagined scenario. Our proposed cause-effect look 1) is general to any baseline VQA architecture, 2) achieves significant improvement on the language-bias sensitive VQA-CP dataset, and 3) fills the theoretical gap in recent language prior based works.

Via

Access Paper or Ask Questions

Learning to Segment the Tail

Apr 03, 2020

Xinting Hu, Yi Jiang, Kaihua Tang, Jingyuan Chen, Chunyan Miao, Hanwang Zhang

Figure 1 for Learning to Segment the Tail

Figure 2 for Learning to Segment the Tail

Figure 3 for Learning to Segment the Tail

Figure 4 for Learning to Segment the Tail

Abstract:Real-world visual recognition requires handling the extreme sample imbalance in large-scale long-tailed data. We propose a "divide&conquer" strategy for the challenging LVIS task: divide the whole data into balanced parts and then apply incremental learning to conquer each one. This derives a novel learning paradigm: class-incremental few-shot learning, which is especially effective for the challenge evolving over time: 1) the class imbalance among the old-class knowledge review and 2) the few-shot data in new-class learning. We call our approach Learning to Segment the Tail (LST). In particular, we design an instance-level balanced replay scheme, which is a memory-efficient approximation to balance the instance-level samples from the old-class images. We also propose to use a meta-module for new-class learning, where the module parameters are shared across incremental phases, gaining the learning-to-learn knowledge incrementally, from the data-rich head to the data-poor tail. We empirically show that: at the expense of a little sacrifice of head-class forgetting, we can gain a significant 8.3% AP improvement for the tail classes with less than 10 instances, achieving an overall 2.0% AP boost for the whole 1,230 classes.

Via

Access Paper or Ask Questions