Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hui Li Tan

A Survey and Evaluation of Adversarial Attacks for Object Detection

Aug 06, 2024

Khoi Nguyen Tiet Nguyen, Wenyu Zhang, Kangkang Lu, Yuhuan Wu, Xingjian Zheng, Hui Li Tan, Liangli Zhen

Abstract:Deep learning models excel in various computer vision tasks but are susceptible to adversarial examples-subtle perturbations in input data that lead to incorrect predictions. This vulnerability poses significant risks in safety-critical applications such as autonomous vehicles, security surveillance, and aircraft health monitoring. While numerous surveys focus on adversarial attacks in image classification, the literature on such attacks in object detection is limited. This paper offers a comprehensive taxonomy of adversarial attacks specific to object detection, reviews existing adversarial robustness evaluation metrics, and systematically assesses open-source attack methods and model robustness. Key observations are provided to enhance the understanding of attack effectiveness and corresponding countermeasures. Additionally, we identify crucial research challenges to guide future efforts in securing automated object detection systems.

* 14 pages

Via

Access Paper or Ask Questions

Combined CNN Transformer Encoder for Enhanced Fine-grained Human Action Recognition

Aug 03, 2022

Mei Chee Leong, Haosong Zhang, Hui Li Tan, Liyuan Li, Joo Hwee Lim

Figure 1 for Combined CNN Transformer Encoder for Enhanced Fine-grained Human Action Recognition

Figure 2 for Combined CNN Transformer Encoder for Enhanced Fine-grained Human Action Recognition

Figure 3 for Combined CNN Transformer Encoder for Enhanced Fine-grained Human Action Recognition

Figure 4 for Combined CNN Transformer Encoder for Enhanced Fine-grained Human Action Recognition

Abstract:Fine-grained action recognition is a challenging task in computer vision. As fine-grained datasets have small inter-class variations in spatial and temporal space, fine-grained action recognition model requires good temporal reasoning and discrimination of attribute action semantics. Leveraging on CNN's ability in capturing high level spatial-temporal feature representations and Transformer's modeling efficiency in capturing latent semantics and global dependencies, we investigate two frameworks that combine CNN vision backbone and Transformer Encoder to enhance fine-grained action recognition: 1) a vision-based encoder to learn latent temporal semantics, and 2) a multi-modal video-text cross encoder to exploit additional text input and learn cross association between visual and text semantics. Our experimental results show that both our Transformer encoder frameworks effectively learn latent temporal semantics and cross-modality association, with improved recognition performance over CNN vision model. We achieve new state-of-the-art performance on the FineGym benchmark dataset for both proposed architectures.

* The Ninth Workshop on Fine-Grained Visual Categorization (FGVC9) @ CVPR2022

Via

Access Paper or Ask Questions

TAILOR: Teaching with Active and Incremental Learning for Object Registration

May 24, 2022

Qianli Xu, Nicolas Gauthier, Wenyu Liang, Fen Fang, Hui Li Tan, Ying Sun, Yan Wu, Liyuan Li, Joo-Hwee Lim

Figure 1 for TAILOR: Teaching with Active and Incremental Learning for Object Registration

Figure 2 for TAILOR: Teaching with Active and Incremental Learning for Object Registration

Figure 3 for TAILOR: Teaching with Active and Incremental Learning for Object Registration

Figure 4 for TAILOR: Teaching with Active and Incremental Learning for Object Registration

Abstract:When deploying a robot to a new task, one often has to train it to detect novel objects, which is time-consuming and labor-intensive. We present TAILOR -- a method and system for object registration with active and incremental learning. When instructed by a human teacher to register an object, TAILOR is able to automatically select viewpoints to capture informative images by actively exploring viewpoints, and employs a fast incremental learning algorithm to learn new objects without potential forgetting of previously learned objects. We demonstrate the effectiveness of our method with a KUKA robot to learn novel objects used in a real-world gearbox assembly task through natural interactions.

* 5 pages, 4 figures, AAAI conference

Via

Access Paper or Ask Questions

Joint Learning On The Hierarchy Representation for Fine-Grained Human Action Recognition

Oct 12, 2021

Mei Chee Leong, Hui Li Tan, Haosong Zhang, Liyuan Li, Feng Lin, Joo Hwee Lim

Figure 1 for Joint Learning On The Hierarchy Representation for Fine-Grained Human Action Recognition

Figure 2 for Joint Learning On The Hierarchy Representation for Fine-Grained Human Action Recognition

Figure 3 for Joint Learning On The Hierarchy Representation for Fine-Grained Human Action Recognition

Figure 4 for Joint Learning On The Hierarchy Representation for Fine-Grained Human Action Recognition

Abstract:Fine-grained human action recognition is a core research topic in computer vision. Inspired by the recently proposed hierarchy representation of fine-grained actions in FineGym and SlowFast network for action recognition, we propose a novel multi-task network which exploits the FineGym hierarchy representation to achieve effective joint learning and prediction for fine-grained human action recognition. The multi-task network consists of three pathways of SlowOnly networks with gradually increased frame rates for events, sets and elements of fine-grained actions, followed by our proposed integration layers for joint learning and prediction. It is a two-stage approach, where it first learns deep feature representation at each hierarchical level, and is followed by feature encoding and fusion for multi-task learning. Our empirical results on the FineGym dataset achieve a new state-of-the-art performance, with 91.80% Top-1 accuracy and 88.46% mean accuracy for element actions, which are 3.40% and 7.26% higher than the previous best results.

* 2021 IEEE International Conference on Image Processing (ICIP)
* Camera ready for IEEE ICIP 2021

Via

Access Paper or Ask Questions

A Survey of Embodied AI: From Simulators to Research Tasks

Mar 14, 2021

Jiafei Duan, Samson Yu, Hui Li Tan, Hongyuan Zhu, Cheston Tan

Figure 1 for A Survey of Embodied AI: From Simulators to Research Tasks

Figure 2 for A Survey of Embodied AI: From Simulators to Research Tasks

Figure 3 for A Survey of Embodied AI: From Simulators to Research Tasks

Figure 4 for A Survey of Embodied AI: From Simulators to Research Tasks

Abstract:There has been an emerging paradigm shift from the era of "internet AI" to "embodied AI", whereby AI algorithms and agents no longer simply learn from datasets of images, videos or text curated primarily from the internet. Instead, they learn through embodied physical interactions with their environments, whether real or simulated. Consequently, there has been substantial growth in the demand for embodied AI simulators to support a diversity of embodied AI research tasks. This growing interest in embodied AI is beneficial to the greater pursuit of artificial general intelligence, but there is no contemporary and comprehensive survey of this field. This paper comprehensively surveys state-of-the-art embodied AI simulators and research, mapping connections between these. By benchmarking nine state-of-the-art embodied AI simulators in terms of seven features, this paper aims to understand the simulators in their provision for use in embodied AI research. Finally, based upon the simulators and a pyramidal hierarchy of embodied AI research tasks, this paper surveys the main research tasks in embodied AI -- visual exploration, visual navigation and embodied question answering (QA), covering the state-of-the-art approaches, evaluation and datasets.

* Submitted for CVIU review

Via

Access Paper or Ask Questions

Actionet: An Interactive End-To-End Platform For Task-Based Data Collection And Augmentation In 3D Environment

Oct 03, 2020

Jiafei Duan, Samson Yu, Hui Li Tan, Cheston Tan

Figure 1 for Actionet: An Interactive End-To-End Platform For Task-Based Data Collection And Augmentation In 3D Environment

Figure 2 for Actionet: An Interactive End-To-End Platform For Task-Based Data Collection And Augmentation In 3D Environment

Figure 3 for Actionet: An Interactive End-To-End Platform For Task-Based Data Collection And Augmentation In 3D Environment

Figure 4 for Actionet: An Interactive End-To-End Platform For Task-Based Data Collection And Augmentation In 3D Environment

Abstract:The problem of task planning for artificial agents remains largely unsolved. While there has been increasing interest in data-driven approaches for the study of task planning for artificial agents, a significant remaining bottleneck is the dearth of large-scale comprehensive task-based datasets. In this paper, we present ActioNet, an interactive end-to-end platform for data collection and augmentation of task-based dataset in 3D environment. Using ActioNet, we collected a large-scale comprehensive task-based dataset, comprising over 3000 hierarchical task structures and videos. Using the hierarchical task structures, the videos are further augmented across 50 different scenes to give over 150,000 video. To our knowledge, ActioNet is the first interactive end-to-end platform for such task-based dataset generation and the accompanying dataset is the largest task-based dataset of such comprehensive nature. The ActioNet platform and dataset will be made available to facilitate research in hierarchical task planning.

* https://github.com/SamsonYuBaiJian/actionet

Via

Access Paper or Ask Questions