Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ting Wu

Domain-invariant Progressive Knowledge Distillation for UAV-based Object Detection

Aug 21, 2024

Liang Yao, Fan Liu, Chuanyi Zhang, Zhiquan Ou, Ting Wu

Abstract:Knowledge distillation (KD) is an effective method for compressing models in object detection tasks. Due to limited computational capability, UAV-based object detection (UAV-OD) widely adopt the KD technique to obtain lightweight detectors. Existing methods often overlook the significant differences in feature space caused by the large gap in scale between the teacher and student models. This limitation hampers the efficiency of knowledge transfer during the distillation process. Furthermore, the complex backgrounds in UAV images make it challenging for the student model to efficiently learn the object features. In this paper, we propose a novel knowledge distillation framework for UAV-OD. Specifically, a progressive distillation approach is designed to alleviate the feature gap between teacher and student models. Then a new feature alignment method is provided to extract object-related features for enhancing student model's knowledge reception efficiency. Finally, extensive experiments are conducted to validate the effectiveness of our proposed approach. The results demonstrate that our proposed method achieves state-of-the-art (SoTA) performance in two UAV-OD datasets.

Via

Access Paper or Ask Questions

Fine-grained Metrics for Point Cloud Semantic Segmentation

Jul 31, 2024

Zhuheng Lu, Ting Wu, Yuewei Dai, Weiqing Li, Zhiyong Su

Abstract:Two forms of imbalances are commonly observed in point cloud semantic segmentation datasets: (1) category imbalances, where certain objects are more prevalent than others; and (2) size imbalances, where certain objects occupy more points than others. Because of this, the majority of categories and large objects are favored in the existing evaluation metrics. This paper suggests fine-grained mIoU and mAcc for a more thorough assessment of point cloud segmentation algorithms in order to address these issues. Richer statistical information is provided for models and datasets by these fine-grained metrics, which also lessen the bias of current semantic segmentation metrics towards large objects. The proposed metrics are used to train and assess various semantic segmentation algorithms on three distinct indoor and outdoor semantic segmentation datasets.

* PRCV 2024

Via

Access Paper or Ask Questions

Progress or Regress? Self-Improvement Reversal in Post-training

Jul 06, 2024

Ting Wu, Xuefeng Li, Pengfei Liu

Abstract:Self-improvement through post-training methods such as iterative preference learning has been acclaimed for enhancing the problem-solving capabilities (e.g., mathematical reasoning) of Large Language Models (LLMs) without human intervention. However, as exploration deepens, it becomes crucial to assess whether these improvements genuinely signify progress in solving more challenging problems or if they could lead to unintended regressions. To address this, we propose a comprehensive evaluative framework that goes beyond the superficial pass@1 metric to scrutinize the underlying enhancements of post-training paradigms for self-improvement. Through rigorous experimentation and analysis across diverse problem-solving tasks, the empirical results point out the phenomenon of \emph{self-improvement reversal}, where models showing improved performance across benchmarks will paradoxically exhibit declines in broader, essential capabilities, like output diversity and out-of-distribution (OOD) generalization. These findings indicate that current self-improvement practices through post-training are inadequate for equipping models to tackle more complex problems. Furthermore, they underscore the necessity of our critical evaluation metrics in discerning the \emph{progress or regress} dichotomy for self-improving LLMs.

Via

Access Paper or Ask Questions

OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI

Jun 18, 2024

Zhen Huang, Zengzhi Wang, Shijie Xia, Xuefeng Li, Haoyang Zou, Ruijie Xu, Run-Ze Fan, Lyumanshan Ye, Ethan Chern, Yixin Ye(+18 more)

Figure 1 for OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI

Figure 2 for OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI

Figure 3 for OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI

Figure 4 for OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI

Abstract:The evolution of Artificial Intelligence (AI) has been significantly accelerated by advancements in Large Language Models (LLMs) and Large Multimodal Models (LMMs), gradually showcasing potential cognitive reasoning abilities in problem-solving and scientific discovery (i.e., AI4Science) once exclusive to human intellect. To comprehensively evaluate current models' performance in cognitive reasoning abilities, we introduce OlympicArena, which includes 11,163 bilingual problems across both text-only and interleaved text-image modalities. These challenges encompass a wide range of disciplines spanning seven fields and 62 international Olympic competitions, rigorously examined for data leakage. We argue that the challenges in Olympic competition problems are ideal for evaluating AI's cognitive reasoning due to their complexity and interdisciplinary nature, which are essential for tackling complex scientific challenges and facilitating discoveries. Beyond evaluating performance across various disciplines using answer-only criteria, we conduct detailed experiments and analyses from multiple perspectives. We delve into the models' cognitive reasoning abilities, their performance across different modalities, and their outcomes in process-level evaluations, which are vital for tasks requiring complex reasoning with lengthy solutions. Our extensive evaluations reveal that even advanced models like GPT-4o only achieve a 39.97% overall accuracy, illustrating current AI limitations in complex reasoning and multimodal integration. Through the OlympicArena, we aim to advance AI towards superintelligence, equipping it to address more complex challenges in science and beyond. We also provide a comprehensive set of resources to support AI research, including a benchmark dataset, an open-source annotation platform, a detailed evaluation tool, and a leaderboard with automatic submission features.

* 44 pages

Via

Access Paper or Ask Questions

UEMM-Air: A Synthetic Multi-modal Dataset for Unmanned Aerial Vehicle Object Detection

Jun 10, 2024

Fan Liu, Liang Yao, Shengxiang Xu, Chuanyi Zhang, Xinlei Zhang, Ting Wu

Figure 1 for UEMM-Air: A Synthetic Multi-modal Dataset for Unmanned Aerial Vehicle Object Detection

Figure 2 for UEMM-Air: A Synthetic Multi-modal Dataset for Unmanned Aerial Vehicle Object Detection

Figure 3 for UEMM-Air: A Synthetic Multi-modal Dataset for Unmanned Aerial Vehicle Object Detection

Figure 4 for UEMM-Air: A Synthetic Multi-modal Dataset for Unmanned Aerial Vehicle Object Detection

Abstract:The development of multi-modal object detection for Unmanned Aerial Vehicles (UAVs) typically relies on a large amount of pixel-aligned multi-modal image data. However, existing datasets face challenges such as limited modalities, high construction costs, and imprecise annotations. To this end, we propose a synthetic multi-modal UAV-based object detection dataset, UEMM-Air. Specially, we simulate various UAV flight scenarios and object types using the Unreal Engine (UE). Then we design the UAV's flight logic to automatically collect data from different scenarios, perspectives, and altitudes. Finally, we propose a novel heuristic automatic annotation algorithm to generate accurate object detection labels. In total, our UEMM-Air consists of 20k pairs of images with 5 modalities and precise annotations. Moreover, we conduct numerous experiments and establish new benchmark results on our dataset. We found that models pre-trained on UEMM-Air exhibit better performance on downstream tasks compared to other similar datasets. The dataset is publicly available (https://github.com/1e12Leon/UEMM-Air) to support the research of multi-modal UAV object detection models.

Via

Access Paper or Ask Questions

Scale-Invariant Feature Disentanglement via Adversarial Learning for UAV-based Object Detection

May 24, 2024

Fan Liu, Liang Yao, Chuanyi Zhang, Ting Wu, Xinlei Zhang, Jun Zhou, Xiruo Jiang

Figure 1 for Scale-Invariant Feature Disentanglement via Adversarial Learning for UAV-based Object Detection

Figure 2 for Scale-Invariant Feature Disentanglement via Adversarial Learning for UAV-based Object Detection

Figure 3 for Scale-Invariant Feature Disentanglement via Adversarial Learning for UAV-based Object Detection

Figure 4 for Scale-Invariant Feature Disentanglement via Adversarial Learning for UAV-based Object Detection

Abstract:Detecting objects from Unmanned Aerial Vehicles (UAV) is often hindered by a large number of small objects, resulting in low detection accuracy. To address this issue, mainstream approaches typically utilize multi-stage inferences. Despite their remarkable detecting accuracies, real-time efficiency is sacrificed, making them less practical to handle real applications. To this end, we propose to improve the single-stage inference accuracy through learning scale-invariant features. Specifically, a Scale-Invariant Feature Disentangling module is designed to disentangle scale-related and scale-invariant features. Then an Adversarial Feature Learning scheme is employed to enhance disentanglement. Finally, scale-invariant features are leveraged for robust UAV-based object detection. Furthermore, we construct a multi-modal UAV object detection dataset, State-Air, which incorporates annotated UAV state parameters. We apply our approach to three state-of-the-art lightweight detection frameworks on three benchmark datasets, including State-Air. Extensive experiments demonstrate that our approach can effectively improve model accuracy. Our code and dataset are provided in Supplementary Materials and will be publicly available once the paper is accepted.

Via

Access Paper or Ask Questions

Refining Segmentation On-the-Fly: An Interactive Framework for Point Cloud Semantic Segmentation

Mar 11, 2024

Peng Zhang, Ting Wu, Jinsheng Sun, Weiqing Li, Zhiyong Su

Figure 1 for Refining Segmentation On-the-Fly: An Interactive Framework for Point Cloud Semantic Segmentation

Figure 2 for Refining Segmentation On-the-Fly: An Interactive Framework for Point Cloud Semantic Segmentation

Figure 3 for Refining Segmentation On-the-Fly: An Interactive Framework for Point Cloud Semantic Segmentation

Figure 4 for Refining Segmentation On-the-Fly: An Interactive Framework for Point Cloud Semantic Segmentation

Abstract:Existing interactive point cloud segmentation approaches primarily focus on the object segmentation, which aim to determine which points belong to the object of interest guided by user interactions. This paper concentrates on an unexplored yet meaningful task, i.e., interactive point cloud semantic segmentation, which assigns high-quality semantic labels to all points in a scene with user corrective clicks. Concretely, we presents the first interactive framework for point cloud semantic segmentation, named InterPCSeg, which seamlessly integrates with off-the-shelf semantic segmentation networks without offline re-training, enabling it to run in an on-the-fly manner. To achieve online refinement, we treat user interactions as sparse training examples during the test-time. To address the instability caused by the sparse supervision, we design a stabilization energy to regulate the test-time training process. For objective and reproducible evaluation, we develop an interaction simulation scheme tailored for the interactive point cloud semantic segmentation task. We evaluate our framework on the S3DIS and ScanNet datasets with off-the-shelf segmentation networks, incorporating interactions from both the proposed interaction simulator and real users. Quantitative and qualitative experimental results demonstrate the efficacy of our framework in refining the semantic segmentation results with user interactions. The source code will be publicly available.

Via

Access Paper or Ask Questions

Modeling the Q-Diversity in a Min-max Play Game for Robust Optimization

May 20, 2023

Ting Wu, Rui Zheng, Tao Gui, Qi Zhang, Xuanjing Huang

Figure 1 for Modeling the Q-Diversity in a Min-max Play Game for Robust Optimization

Figure 2 for Modeling the Q-Diversity in a Min-max Play Game for Robust Optimization

Figure 3 for Modeling the Q-Diversity in a Min-max Play Game for Robust Optimization

Figure 4 for Modeling the Q-Diversity in a Min-max Play Game for Robust Optimization

Abstract:Models trained with empirical risk minimization (ERM) are revealed to easily rely on spurious correlations, resulting in poor generalization. Group distributionally robust optimization (group DRO) can alleviate this problem by minimizing the worst-case loss over pre-defined groups. While promising, in practice factors like expensive annotations and privacy preclude the availability of group labels. More crucially, when taking a closer look at the failure modes of out-of-distribution generalization, the typical procedure of reweighting in group DRO loses efficiency. Hinged on the limitations, in this work, we reformulate the group DRO framework by proposing Q-Diversity. Characterized by an interactive training mode, Q-Diversity relaxes the group identification from annotation into direct parameterization. Furthermore, a novel mixing strategy across groups is presented to diversify the under-represented groups. In a series of experiments on both synthetic and real-world text classification tasks, results demonstrate that Q-Diversity can consistently improve worst-case accuracy under different distributional shifts, outperforming state-of-the-art alternatives.

* Findings of ACL 2023

Via

Access Paper or Ask Questions

Enhancing Contrastive Learning with Noise-Guided Attack: Towards Continual Relation Extraction in the Wild

May 11, 2023

Ting Wu, Jingyi Liu, Rui Zheng, Qi Zhang, Tao Gui, Xuanjing Huang

Abstract:The principle of continual relation extraction~(CRE) involves adapting to emerging novel relations while preserving od knowledge. While current endeavors in CRE succeed in preserving old knowledge, they tend to fail when exposed to contaminated data streams. We assume this is attributed to their reliance on an artificial hypothesis that the data stream has no annotation errors, which hinders real-world applications for CRE. Considering the ubiquity of noisy labels in real-world datasets, in this paper, we formalize a more practical learning scenario, termed as \textit{noisy-CRE}. Building upon this challenging setting, we develop a noise-resistant contrastive framework named as \textbf{N}oise-guided \textbf{a}ttack in \textbf{C}ontrative \textbf{L}earning~(NaCL) to learn incremental corrupted relations. Compared to direct noise discarding or inaccessible noise relabeling, we present modifying the feature space to match the given noisy labels via attacking can better enrich contrastive representations. Extensive empirical validations highlight that NaCL can achieve consistent performance improvements with increasing noise rates, outperforming state-of-the-art baselines.

Via

Access Paper or Ask Questions

Less is Better: Recovering Intended-Feature Subspace to Robustify NLU Models

Sep 16, 2022

Ting Wu, Tao Gui

Figure 1 for Less is Better: Recovering Intended-Feature Subspace to Robustify NLU Models

Figure 2 for Less is Better: Recovering Intended-Feature Subspace to Robustify NLU Models

Figure 3 for Less is Better: Recovering Intended-Feature Subspace to Robustify NLU Models

Figure 4 for Less is Better: Recovering Intended-Feature Subspace to Robustify NLU Models

Abstract:Datasets with significant proportions of bias present threats for training a trustworthy model on NLU tasks. Despite yielding great progress, current debiasing methods impose excessive reliance on the knowledge of bias attributes. Definition of the attributes, however, is elusive and varies across different datasets. Furthermore, leveraging these attributes at input level to bias mitigation may leave a gap between intrinsic properties and the underlying decision rule. To narrow down this gap and liberate the supervision on bias, we suggest extending bias mitigation into feature space. Therefore, a novel model, Recovering Intended-Feature Subspace with Knowledge-Free (RISK) is developed. Assuming that shortcut features caused by various biases are unintended for prediction, RISK views them as redundant features. When delving into a lower manifold to remove redundancies, RISK reveals that an extremely low-dimensional subspace with intended features can robustly represent the highly biased dataset. Empirical results demonstrate our model can consistently improve model generalization to out-of-distribution set, and achieves a new state-of-the-art performance.

* Acceptted by COLING 2022

Via

Access Paper or Ask Questions