Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wei-Chao Chen

Inventec Corp, Skywatch Innovation Inc

Who Brings the Frisbee: Probing Hidden Hallucination Factors in Large Vision-Language Model via Causality Analysis

Dec 04, 2024

Po-Hsuan Huang, Jeng-Lin Li, Chin-Po Chen, Ming-Ching Chang, Wei-Chao Chen

Abstract:Recent advancements in large vision-language models (LVLM) have significantly enhanced their ability to comprehend visual inputs alongside natural language. However, a major challenge in their real-world application is hallucination, where LVLMs generate non-existent visual elements, eroding user trust. The underlying mechanism driving this multimodal hallucination is poorly understood. Minimal research has illuminated whether contexts such as sky, tree, or grass field involve the LVLM in hallucinating a frisbee. We hypothesize that hidden factors, such as objects, contexts, and semantic foreground-background structures, induce hallucination. This study proposes a novel causal approach: a hallucination probing system to identify these hidden factors. By analyzing the causality between images, text prompts, and network saliency, we systematically explore interventions to block these factors. Our experimental findings show that a straightforward technique based on our analysis can significantly reduce hallucinations. Additionally, our analyses indicate the potential to edit network internals to minimize hallucinated outputs.

* Accepted by WACV2025

Via

Access Paper or Ask Questions

Benchmarking Smoothness and Reducing High-Frequency Oscillations in Continuous Control Policies

Oct 22, 2024

Guilherme Christmann, Ying-Sheng Luo, Hanjaya Mandala, Wei-Chao Chen

Abstract:Reinforcement learning (RL) policies are prone to high-frequency oscillations, especially undesirable when deploying to hardware in the real-world. In this paper, we identify, categorize, and compare methods from the literature that aim to mitigate high-frequency oscillations in deep RL. We define two broad classes: loss regularization and architectural methods. At their core, these methods incentivize learning a smooth mapping, such that nearby states in the input space produce nearby actions in the output space. We present benchmarks in terms of policy performance and control smoothness on traditional RL environments from the Gymnasium and a complex manipulation task, as well as three robotics locomotion tasks that include deployment and evaluation with real-world hardware. Finally, we also propose hybrid methods that combine elements from both loss regularization and architectural methods. We find that the best-performing hybrid outperforms other methods, and improves control smoothness by 26.8% over the baseline, with a worst-case performance degradation of just 2.8%.

* Presented in IROS 2024

Via

Access Paper or Ask Questions

Learning with Instance-Dependent Noisy Labels by Anchor Hallucination and Hard Sample Label Correction

Jul 10, 2024

Po-Hsuan Huang, Chia-Ching Lin, Chih-Fan Hsu, Ming-Ching Chang, Wei-Chao Chen

Figure 1 for Learning with Instance-Dependent Noisy Labels by Anchor Hallucination and Hard Sample Label Correction

Figure 2 for Learning with Instance-Dependent Noisy Labels by Anchor Hallucination and Hard Sample Label Correction

Figure 3 for Learning with Instance-Dependent Noisy Labels by Anchor Hallucination and Hard Sample Label Correction

Figure 4 for Learning with Instance-Dependent Noisy Labels by Anchor Hallucination and Hard Sample Label Correction

Abstract:Learning from noisy-labeled data is crucial for real-world applications. Traditional Noisy-Label Learning (NLL) methods categorize training data into clean and noisy sets based on the loss distribution of training samples. However, they often neglect that clean samples, especially those with intricate visual patterns, may also yield substantial losses. This oversight is particularly significant in datasets with Instance-Dependent Noise (IDN), where mislabeling probabilities correlate with visual appearance. Our approach explicitly distinguishes between clean vs.noisy and easy vs. hard samples. We identify training samples with small losses, assuming they have simple patterns and correct labels. Utilizing these easy samples, we hallucinate multiple anchors to select hard samples for label correction. Corrected hard samples, along with the easy samples, are used as labeled data in subsequent semi-supervised training. Experiments on synthetic and real-world IDN datasets demonstrate the superior performance of our method over other state-of-the-art NLL methods.

* ICIP 2024

Via

Access Paper or Ask Questions

Expert Composer Policy: Scalable Skill Repertoire for Quadruped Robots

Mar 18, 2024

Guilherme Christmann, Ying-Sheng Luo, Wei-Chao Chen

Figure 1 for Expert Composer Policy: Scalable Skill Repertoire for Quadruped Robots

Figure 2 for Expert Composer Policy: Scalable Skill Repertoire for Quadruped Robots

Figure 3 for Expert Composer Policy: Scalable Skill Repertoire for Quadruped Robots

Figure 4 for Expert Composer Policy: Scalable Skill Repertoire for Quadruped Robots

Abstract:We propose the expert composer policy, a framework to reliably expand the skill repertoire of quadruped agents. The composer policy links pair of experts via transitions to a sampled target state, allowing experts to be composed sequentially. Each expert specializes in a single skill, such as a locomotion gait or a jumping motion. Instead of a hierarchical or mixture-of-experts architecture, we train a single composer policy in an independent process that is not conditioned on the other expert policies. By reusing the same composer policy, our approach enables adding new experts without affecting existing ones, enabling incremental repertoire expansion and preserving original motion quality. We measured the transition success rate of 72 transition pairs and achieved an average success rate of 99.99\%, which is over 10\% higher than the baseline random approach, and outperforms other state-of-the-art methods. Using domain randomization during training we ensure a successful transfer to the real world, where we achieve an average transition success rate of 97.22\% (N=360) in our experiments.

* ICRA 2024

Via

Access Paper or Ask Questions

A Comprehensive Review of Machine Learning Advances on Data Change: A Cross-Field Perspective

Feb 20, 2024

Jeng-Lin Li, Chih-Fan Hsu, Ming-Ching Chang, Wei-Chao Chen

Abstract:Recent artificial intelligence (AI) technologies show remarkable evolution in various academic fields and industries. However, in the real world, dynamic data lead to principal challenges for deploying AI models. An unexpected data change brings about severe performance degradation in AI models. We identify two major related research fields, domain shift and concept drift according to the setting of the data change. Although these two popular research fields aim to solve distribution shift and non-stationary data stream problems, the underlying properties remain similar which also encourages similar technical approaches. In this review, we regroup domain shift and concept drift into a single research problem, namely the data change problem, with a systematic overview of state-of-the-art methods in the two research fields. We propose a three-phase problem categorization scheme to link the key ideas in the two technical fields. We thus provide a novel scope for researchers to explore contemporary technical strategies, learn industrial applications, and identify future directions for addressing data change challenges.

Via

Access Paper or Ask Questions

Expanding Versatility of Agile Locomotion through Policy Transitions Using Latent State Representation

Jun 14, 2023

Guilherme Christmann, Ying-Sheng Luo, Jonathan Hans Soeseno, Wei-Chao Chen

Abstract:This paper proposes the transition-net, a robust transition strategy that expands the versatility of robot locomotion in the real-world setting. To this end, we start by distributing the complexity of different gaits into dedicated locomotion policies applicable to real-world robots. Next, we expand the versatility of the robot by unifying the policies with robust transitions into a single coherent meta-controller by examining the latent state representations. Our approach enables the robot to iteratively expand its skill repertoire and robustly transition between any policy pair in a library. In our framework, adding new skills does not introduce any process that alters the previously learned skills. Moreover, training of a locomotion policy takes less than an hour with a single consumer GPU. Our approach is effective in the real-world and achieves a 19% higher average success rate for the most challenging transition pairs in our experiments compared to existing approaches.

* Presented at ICRA 2023

Via

Access Paper or Ask Questions

FedDig: Robust Federated Learning Using Data Digest to Represent Absent Clients

Oct 05, 2022

Chih-Fan Hsu, Ming-Ching Chang, Wei-Chao Chen

Figure 1 for FedDig: Robust Federated Learning Using Data Digest to Represent Absent Clients

Figure 2 for FedDig: Robust Federated Learning Using Data Digest to Represent Absent Clients

Figure 3 for FedDig: Robust Federated Learning Using Data Digest to Represent Absent Clients

Figure 4 for FedDig: Robust Federated Learning Using Data Digest to Represent Absent Clients

Abstract:Federated Learning (FL) effectively protects client data privacy. However, client absence or leaving during training can seriously degrade model performances, particularly for unbalanced and non-IID client data. We address this issue by generating data digests from the raw data and using them to guide training at the FL moderator. The proposed FL framework, called FedDig, can tolerate unexpected client absence in cross-silo scenarios while preserving client data privacy because the digests de-identify the raw data by mixing encoded features in the features space. We evaluate FedDig using EMNIST, CIFAR-10, and CIFAR-100; the results consistently outperform against three baseline algorithms (FedAvg, FedProx, and FedNova) by large margins in various client absence scenarios.

Via

Access Paper or Ask Questions

Transition Motion Tensor: A Data-Driven Approach for Versatile and Controllable Agents in Physically Simulated Environments

Nov 30, 2021

Jonathan Hans Soeseno, Ying-Sheng Luo, Trista Pei-Chun Chen, Wei-Chao Chen

Figure 1 for Transition Motion Tensor: A Data-Driven Approach for Versatile and Controllable Agents in Physically Simulated Environments

Figure 2 for Transition Motion Tensor: A Data-Driven Approach for Versatile and Controllable Agents in Physically Simulated Environments

Figure 3 for Transition Motion Tensor: A Data-Driven Approach for Versatile and Controllable Agents in Physically Simulated Environments

Figure 4 for Transition Motion Tensor: A Data-Driven Approach for Versatile and Controllable Agents in Physically Simulated Environments

Abstract:This paper proposes the Transition Motion Tensor, a data-driven framework that creates novel and physically accurate transitions outside of the motion dataset. It enables simulated characters to adopt new motion skills efficiently and robustly without modifying existing ones. Given several physically simulated controllers specializing in different motions, the tensor serves as a temporal guideline to transition between them. Through querying the tensor for transitions that best fit user-defined preferences, we can create a unified controller capable of producing novel transitions and solving complex tasks that may require multiple motions to work coherently. We apply our framework on both quadrupeds and bipeds, perform quantitative and qualitative evaluations on transition quality, and demonstrate its capability of tackling complex motion planning problems while following user control directives.

* 4 pages, 6 figures

Via

Access Paper or Ask Questions

TrustMAE: A Noise-Resilient Defect Classification Framework using Memory-Augmented Auto-Encoders with Trust Regions

Dec 29, 2020

Daniel Stanley Tan, Yi-Chun Chen, Trista Pei-Chun Chen, Wei-Chao Chen

Figure 1 for TrustMAE: A Noise-Resilient Defect Classification Framework using Memory-Augmented Auto-Encoders with Trust Regions

Figure 2 for TrustMAE: A Noise-Resilient Defect Classification Framework using Memory-Augmented Auto-Encoders with Trust Regions

Figure 3 for TrustMAE: A Noise-Resilient Defect Classification Framework using Memory-Augmented Auto-Encoders with Trust Regions

Figure 4 for TrustMAE: A Noise-Resilient Defect Classification Framework using Memory-Augmented Auto-Encoders with Trust Regions

Abstract:In this paper, we propose a framework called TrustMAE to address the problem of product defect classification. Instead of relying on defective images that are difficult to collect and laborious to label, our framework can accept datasets with unlabeled images. Moreover, unlike most anomaly detection methods, our approach is robust against noises, or defective images, in the training dataset. Our framework uses a memory-augmented auto-encoder with a sparse memory addressing scheme to avoid over-generalizing the auto-encoder, and a novel trust-region memory updating scheme to keep the noises away from the memory slots. The result is a framework that can reconstruct defect-free images and identify the defective regions using a perceptual distance network. When compared against various state-of-the-art baselines, our approach performs competitively under noise-free MVTec datasets. More importantly, it remains effective at a noise level up to 40% while significantly outperforming other baselines.

* Accepted at WACV 2021

Via

Access Paper or Ask Questions

GrateTile: Efficient Sparse Tensor Tiling for CNN Processing

Sep 18, 2020

Yu-Sheng Lin, Hung Chang Lu, Yang-Bin Tsao, Yi-Min Chih, Wei-Chao Chen, Shao-Yi Chien

Figure 1 for GrateTile: Efficient Sparse Tensor Tiling for CNN Processing

Figure 2 for GrateTile: Efficient Sparse Tensor Tiling for CNN Processing

Figure 3 for GrateTile: Efficient Sparse Tensor Tiling for CNN Processing

Figure 4 for GrateTile: Efficient Sparse Tensor Tiling for CNN Processing

Abstract:We propose GrateTile, an efficient, hardwarefriendly data storage scheme for sparse CNN feature maps (activations). It divides data into uneven-sized subtensors and, with small indexing overhead, stores them in a compressed yet randomly accessible format. This design enables modern CNN accelerators to fetch and decompressed sub-tensors on-the-fly in a tiled processing manner. GrateTile is suitable for architectures that favor aligned, coalesced data access, and only requires minimal changes to the overall architectural design. We simulate GrateTile with state-of-the-art CNNs and show an average of 55% DRAM bandwidth reduction while using only 0.6% of feature map size for indexing storage.

* To be published at IEEE Workshop on Signal Processing System (SiPS 2020)

Via

Access Paper or Ask Questions