Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ching-Chun Huang

RC-AutoCalib: An End-to-End Radar-Camera Automatic Calibration Network

May 28, 2025

Van-Tin Luu, Yon-Lin Cai, Vu-Hoang Tran, Wei-Chen Chiu, Yi-Ting Chen, Ching-Chun Huang

Abstract:This paper presents a groundbreaking approach - the first online automatic geometric calibration method for radar and camera systems. Given the significant data sparsity and measurement uncertainty in radar height data, achieving automatic calibration during system operation has long been a challenge. To address the sparsity issue, we propose a Dual-Perspective representation that gathers features from both frontal and bird's-eye views. The frontal view contains rich but sensitive height information, whereas the bird's-eye view provides robust features against height uncertainty. We thereby propose a novel Selective Fusion Mechanism to identify and fuse reliable features from both perspectives, reducing the effect of height uncertainty. Moreover, for each view, we incorporate a Multi-Modal Cross-Attention Mechanism to explicitly find location correspondences through cross-modal matching. During the training phase, we also design a Noise-Resistant Matcher to provide better supervision and enhance the robustness of the matching mechanism against sparsity and height uncertainty. Our experimental results, tested on the nuScenes dataset, demonstrate that our method significantly outperforms previous radar-camera auto-calibration methods, as well as existing state-of-the-art LiDAR-camera calibration techniques, establishing a new benchmark for future research. The code is available at https://github.com/nycu-acm/RC-AutoCalib.

* Accepted to CVPR 2025

Via

Access Paper or Ask Questions

Swapped Logit Distillation via Bi-level Teacher Alignment

Apr 27, 2025

Stephen Ekaputra Limantoro, Jhe-Hao Lin, Chih-Yu Wang, Yi-Lung Tsai, Hong-Han Shuai, Ching-Chun Huang, Wen-Huang Cheng

Abstract:Knowledge distillation (KD) compresses the network capacity by transferring knowledge from a large (teacher) network to a smaller one (student). It has been mainstream that the teacher directly transfers knowledge to the student with its original distribution, which can possibly lead to incorrect predictions. In this article, we propose a logit-based distillation via swapped logit processing, namely Swapped Logit Distillation (SLD). SLD is proposed under two assumptions: (1) the wrong prediction occurs when the prediction label confidence is not the maximum; (2) the "natural" limit of probability remains uncertain as the best value addition to the target cannot be determined. To address these issues, we propose a swapped logit processing scheme. Through this approach, we find that the swap method can be effectively extended to teacher and student outputs, transforming into two teachers. We further introduce loss scheduling to boost the performance of two teachers' alignment. Extensive experiments on image classification tasks demonstrate that SLD consistently performs best among previous state-of-the-art methods.

* Accepted to Multimedia Systems 2025

Via

Access Paper or Ask Questions

Boosting Diffusion Guidance via Learning Degradation-Aware Models for Blind Super Resolution

Jan 15, 2025

Shao-Hao Lu, Ren Wang, Ching-Chun Huang, Wei-Chen Chiu

Abstract:Recently, diffusion-based blind super-resolution (SR) methods have shown great ability to generate high-resolution images with abundant high-frequency detail, but the detail is often achieved at the expense of fidelity. Meanwhile, another line of research focusing on rectifying the reverse process of diffusion models (i.e., diffusion guidance), has demonstrated the power to generate high-fidelity results for non-blind SR. However, these methods rely on known degradation kernels, making them difficult to apply to blind SR. To address these issues, we introduce degradation-aware models that can be integrated into the diffusion guidance framework, eliminating the need to know degradation kernels. Additionally, we propose two novel techniques input perturbation and guidance scalar to further improve our performance. Extensive experimental results show that our proposed method has superior performance over state-of-the-art methods on blind SR benchmarks

* To appear in WACV 2025. Code is available at: https://github.com/ryanlu2240/Boosting-Diffusion-Guidance-via-Learning-Degradation-Aware-Models-for-Blind-Super-Resolution

Via

Access Paper or Ask Questions

MENTOR: Multilingual tExt detectioN TOward leaRning by analogy

Mar 12, 2024

Hsin-Ju Lin, Tsu-Chun Chung, Ching-Chun Hsiao, Pin-Yu Chen, Wei-Chen Chiu, Ching-Chun Huang

Figure 1 for MENTOR: Multilingual tExt detectioN TOward leaRning by analogy

Figure 2 for MENTOR: Multilingual tExt detectioN TOward leaRning by analogy

Figure 3 for MENTOR: Multilingual tExt detectioN TOward leaRning by analogy

Figure 4 for MENTOR: Multilingual tExt detectioN TOward leaRning by analogy

Abstract:Text detection is frequently used in vision-based mobile robots when they need to interpret texts in their surroundings to perform a given task. For instance, delivery robots in multilingual cities need to be capable of doing multilingual text detection so that the robots can read traffic signs and road markings. Moreover, the target languages change from region to region, implying the need of efficiently re-training the models to recognize the novel/new languages. However, collecting and labeling training data for novel languages are cumbersome, and the efforts to re-train an existing/trained text detector are considerable. Even worse, such a routine would repeat whenever a novel language appears. This motivates us to propose a new problem setting for tackling the aforementioned challenges in a more efficient way: "We ask for a generalizable multilingual text detection framework to detect and identify both seen and unseen language regions inside scene images without the requirement of collecting supervised training data for unseen languages as well as model re-training". To this end, we propose "MENTOR", the first work to realize a learning strategy between zero-shot learning and few-shot learning for multilingual scene text detection.

* 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 2023, pp. 3248-3255
* 8 pages, 4 figures, published to IROS 2023

Via

Access Paper or Ask Questions

Masking Improves Contrastive Self-Supervised Learning for ConvNets, and Saliency Tells You Where

Sep 22, 2023

Zhi-Yi Chin, Chieh-Ming Jiang, Ching-Chun Huang, Pin-Yu Chen, Wei-Chen Chiu

Abstract:While image data starts to enjoy the simple-but-effective self-supervised learning scheme built upon masking and self-reconstruction objective thanks to the introduction of tokenization procedure and vision transformer backbone, convolutional neural networks as another important and widely-adopted architecture for image data, though having contrastive-learning techniques to drive the self-supervised learning, still face the difficulty of leveraging such straightforward and general masking operation to benefit their learning process significantly. In this work, we aim to alleviate the burden of including masking operation into the contrastive-learning framework for convolutional neural networks as an extra augmentation method. In addition to the additive but unwanted edges (between masked and unmasked regions) as well as other adverse effects caused by the masking operations for ConvNets, which have been discussed by prior works, we particularly identify the potential problem where for one view in a contrastive sample-pair the randomly-sampled masking regions could be overly concentrated on important/salient objects thus resulting in misleading contrastiveness to the other view. To this end, we propose to explicitly take the saliency constraint into consideration in which the masked regions are more evenly distributed among the foreground and background for realizing the masking-based augmentation. Moreover, we introduce hard negative samples by masking larger regions of salient patches in an input image. Extensive experiments conducted on various datasets, contrastive learning mechanisms, and downstream tasks well verify the efficacy as well as the superior performance of our proposed method with respect to several state-of-the-art baselines.

Via

Access Paper or Ask Questions

Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts

Sep 12, 2023

Zhi-Yi Chin, Chieh-Ming Jiang, Ching-Chun Huang, Pin-Yu Chen, Wei-Chen Chiu

Figure 1 for Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts

Figure 2 for Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts

Figure 3 for Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts

Figure 4 for Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts

Abstract:Text-to-image diffusion models, e.g. Stable Diffusion (SD), lately have shown remarkable ability in high-quality content generation, and become one of the representatives for the recent wave of transformative AI. Nevertheless, such advance comes with an intensifying concern about the misuse of this generative technology, especially for producing copyrighted or NSFW (i.e. not safe for work) images. Although efforts have been made to filter inappropriate images/prompts or remove undesirable concepts/styles via model fine-tuning, the reliability of these safety mechanisms against diversified problematic prompts remains largely unexplored. In this work, we propose Prompting4Debugging (P4D) as a debugging and red-teaming tool that automatically finds problematic prompts for diffusion models to test the reliability of a deployed safety mechanism. We demonstrate the efficacy of our P4D tool in uncovering new vulnerabilities of SD models with safety mechanisms. Particularly, our result shows that around half of prompts in existing safe prompting benchmarks which were originally considered "safe" can actually be manipulated to bypass many deployed safety mechanisms, including concept removal, negative prompt, and safety guidance. Our findings suggest that, without comprehensive testing, the evaluations on limited safe prompting benchmarks can lead to a false sense of safety for text-to-image models.

Via

Access Paper or Ask Questions

ExReg: Wide-range Photo Exposure Correction via a Multi-dimensional Regressor with Attention

Dec 14, 2022

Tzu-Hao Chiang, Hao-Chien Hsueh, Ching-Chun Hsiao, Ching-Chun Huang

Figure 1 for ExReg: Wide-range Photo Exposure Correction via a Multi-dimensional Regressor with Attention

Figure 2 for ExReg: Wide-range Photo Exposure Correction via a Multi-dimensional Regressor with Attention

Figure 3 for ExReg: Wide-range Photo Exposure Correction via a Multi-dimensional Regressor with Attention

Figure 4 for ExReg: Wide-range Photo Exposure Correction via a Multi-dimensional Regressor with Attention

Abstract:Photo exposure correction is widely investigated, but fewer studies focus on correcting under and over-exposed images simultaneously. Three issues remain open to handle and correct under and over-exposed images in a unified way. First, a locally-adaptive exposure adjustment may be more flexible instead of learning a global mapping. Second, it is an ill-posed problem to determine the suitable exposure values locally. Third, photos with the same content but different exposures may not reach consistent adjustment results. To this end, we proposed a novel exposure correction network, ExReg, to address the challenges by formulating exposure correction as a multi-dimensional regression process. Given an input image, a compact multi-exposure generation network is introduced to generate images with different exposure conditions for multi-dimensional regression and exposure correction in the next stage. An auxiliary module is designed to predict the region-wise exposure values, guiding the mainly proposed Encoder-Decoder ANP (Attentive Neural Processes) to regress the final corrected image. The experimental results show that ExReg can generate well-exposed results and outperform the SOTA method by 1.3dB in PSNR for extensive exposure problems. In addition, given the same image but under various exposure for testing, the corrected results are more visually consistent and physically accurate.

* 12 pages, 8 figures

Via

Access Paper or Ask Questions

Feature-enhanced Adversarial Semi-supervised Semantic Segmentation Network for Pulmonary Embolism Annotation

Apr 08, 2022

Ting-Wei Cheng, Jerry Chang, Ching-Chun Huang, Chin Kuo, Yun-Chien Cheng

Figure 1 for Feature-enhanced Adversarial Semi-supervised Semantic Segmentation Network for Pulmonary Embolism Annotation

Figure 2 for Feature-enhanced Adversarial Semi-supervised Semantic Segmentation Network for Pulmonary Embolism Annotation

Figure 3 for Feature-enhanced Adversarial Semi-supervised Semantic Segmentation Network for Pulmonary Embolism Annotation

Figure 4 for Feature-enhanced Adversarial Semi-supervised Semantic Segmentation Network for Pulmonary Embolism Annotation

Abstract:This study established a feature-enhanced adversarial semi-supervised semantic segmentation model to automatically annotate pulmonary embolism lesion areas in computed tomography pulmonary angiogram (CTPA) images. In current studies, all of the PE CTPA image segmentation methods are trained by supervised learning. However, the supervised learning models need to be retrained and the images need to be relabeled when the CTPA images come from different hospitals. This study proposed a semi-supervised learning method to make the model applicable to different datasets by adding a small amount of unlabeled images. By training the model with both labeled and unlabeled images, the accuracy of unlabeled images can be improved and the labeling cost can be reduced. Our semi-supervised segmentation model includes a segmentation network and a discriminator network. We added feature information generated from the encoder of segmentation network to the discriminator so that it can learn the similarity between predicted mask and ground truth mask. This HRNet-based architecture can maintain a higher resolution for convolutional operations so the prediction of small PE lesion areas can be improved. We used the labeled open-source dataset and the unlabeled National Cheng Kung University Hospital (NCKUH) (IRB number: B-ER-108-380) dataset to train the semi-supervised learning model, and the resulting mean intersection over union (mIOU), dice score, and sensitivity achieved 0.3510, 0.4854, and 0.4253, respectively on the NCKUH dataset. Then, we fine-tuned and tested the model with a small amount of unlabeled PE CTPA images from China Medical University Hospital (CMUH) (IRB number: CMUH110-REC3-173) dataset. Comparing the results of our semi-supervised model with the supervised model, the mIOU, dice score, and sensitivity improved from 0.2344, 0.3325, and 0.3151 to 0.3721, 0.5113, and 0.4967, respectively.

Via

Access Paper or Ask Questions

Make an Omelette with Breaking Eggs: Zero-Shot Learning for Novel Attribute Synthesis

Nov 28, 2021

Yu Hsuan Li, Tzu-Yin Chao, Ching-Chun Huang, Pin-Yu Chen, Wei-Chen Chiu

Figure 1 for Make an Omelette with Breaking Eggs: Zero-Shot Learning for Novel Attribute Synthesis

Figure 2 for Make an Omelette with Breaking Eggs: Zero-Shot Learning for Novel Attribute Synthesis

Figure 3 for Make an Omelette with Breaking Eggs: Zero-Shot Learning for Novel Attribute Synthesis

Figure 4 for Make an Omelette with Breaking Eggs: Zero-Shot Learning for Novel Attribute Synthesis

Abstract:Most of the existing algorithms for zero-shot classification problems typically rely on the attribute-based semantic relations among categories to realize the classification of novel categories without observing any of their instances. However, training the zero-shot classification models still requires attribute labeling for each class (or even instance) in the training dataset, which is also expensive. To this end, in this paper, we bring up a new problem scenario: "Are we able to derive zero-shot learning for novel attribute detectors/classifiers and use them to automatically annotate the dataset for labeling efficiency?" Basically, given only a small set of detectors that are learned to recognize some manually annotated attributes (i.e., the seen attributes), we aim to synthesize the detectors of novel attributes in a zero-shot learning manner. Our proposed method, Zero Shot Learning for Attributes (ZSLA), which is the first of its kind to the best of our knowledge, tackles this new research problem by applying the set operations to first decompose the seen attributes into their basic attributes and then recombine these basic attributes into the novel ones. Extensive experiments are conducted to verify the capacity of our synthesized detectors for accurately capturing the semantics of the novel attributes and show their superior performance in terms of detection and localization compared to other baseline approaches. Moreover, with using only 32 seen attributes on the Caltech-UCSD Birds-200-2011 dataset, our proposed method is able to synthesize other 207 novel attributes, while various generalized zero-shot classification algorithms trained upon the dataset re-annotated by our synthesized attribute detectors are able to provide comparable performance with those trained with the manual ground-truth annotations.

* 18 pages, 14 figures

Via

Access Paper or Ask Questions

Video Rescaling Networks with Joint Optimization Strategies for Downscaling and Upscaling

Mar 27, 2021

Yan-Cheng Huang, Yi-Hsin Chen, Cheng-You Lu, Hui-Po Wang, Wen-Hsiao Peng, Ching-Chun Huang

Figure 1 for Video Rescaling Networks with Joint Optimization Strategies for Downscaling and Upscaling

Figure 2 for Video Rescaling Networks with Joint Optimization Strategies for Downscaling and Upscaling

Figure 3 for Video Rescaling Networks with Joint Optimization Strategies for Downscaling and Upscaling

Figure 4 for Video Rescaling Networks with Joint Optimization Strategies for Downscaling and Upscaling

Abstract:This paper addresses the video rescaling task, which arises from the needs of adapting the video spatial resolution to suit individual viewing devices. We aim to jointly optimize video downscaling and upscaling as a combined task. Most recent studies focus on image-based solutions, which do not consider temporal information. We present two joint optimization approaches based on invertible neural networks with coupling layers. Our Long Short-Term Memory Video Rescaling Network (LSTM-VRN) leverages temporal information in the low-resolution video to form an explicit prediction of the missing high-frequency information for upscaling. Our Multi-input Multi-output Video Rescaling Network (MIMO-VRN) proposes a new strategy for downscaling and upscaling a group of video frames simultaneously. Not only do they outperform the image-based invertible model in terms of quantitative and qualitative results, but also show much improved upscaling quality than the video rescaling methods without joint optimization. To our best knowledge, this work is the first attempt at the joint optimization of video downscaling and upscaling.

* Accepted by CVPR 2021

Via

Access Paper or Ask Questions