Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Young-Ju Choi

ssFPN: Scale Sequence (S^2) Feature Based-Feature Pyramid Network for Object Detection

Aug 25, 2022

Hye-Jin Park, Young-Ju Choi, Young-Woon Lee, Byung-Gyu Kim

Figure 1 for ssFPN: Scale Sequence (S^2) Feature Based-Feature Pyramid Network for Object Detection

Figure 2 for ssFPN: Scale Sequence (S^2) Feature Based-Feature Pyramid Network for Object Detection

Figure 3 for ssFPN: Scale Sequence (S^2) Feature Based-Feature Pyramid Network for Object Detection

Figure 4 for ssFPN: Scale Sequence (S^2) Feature Based-Feature Pyramid Network for Object Detection

Abstract:Feature Pyramid Network (FPN) has been an essential module for object detection models to consider various scales of an object. However, average precision (AP) on small objects is relatively lower than AP on medium and large objects. The reason is why the deeper layer of CNN causes information loss as feature extraction level. We propose a new scale sequence (S^2) feature extraction of FPN to strengthen feature information of small objects. We consider FPN structure as scale-space and extract scale sequence (S^2) feature by 3D convolution on the level axis of FPN. It is basically scale invariant feature and is built on high-resolution pyramid feature map for small objects. Furthermore, the proposed S^2 feature can be extended to most object detection models based on FPN. We demonstrate the proposed S2 feature can improve the performance of both one-stage and two-stage detectors on MS COCO dataset. Based on the proposed S2 feature, we achieve upto 1.3% and 1.1% of AP improvement for YOLOv4-P5 and YOLOv4-P6, respectively. For Faster RCNN and Mask R-CNN, we observe upto 2.0% and 1.6% of AP improvement with the suggested S^2 feature, respectively.

Via

Access Paper or Ask Questions

Group-based Bi-Directional Recurrent Wavelet Neural Networks for Video Super-Resolution

Jun 14, 2021

Young-Ju Choi, Young-Woon Lee, Byung-Gyu Kim

Figure 1 for Group-based Bi-Directional Recurrent Wavelet Neural Networks for Video Super-Resolution

Figure 2 for Group-based Bi-Directional Recurrent Wavelet Neural Networks for Video Super-Resolution

Figure 3 for Group-based Bi-Directional Recurrent Wavelet Neural Networks for Video Super-Resolution

Figure 4 for Group-based Bi-Directional Recurrent Wavelet Neural Networks for Video Super-Resolution

Abstract:Video super-resolution (VSR) aims to estimate a high-resolution (HR) frame from a low-resolution (LR) frames. The key challenge for VSR lies in the effective exploitation of spatial correlation in an intra-frame and temporal dependency between consecutive frames. However, most of the previous methods treat different types of the spatial features identically and extract spatial and temporal features from the separated modules. It leads to lack of obtaining meaningful information and enhancing the fine details. In VSR, there are three types of temporal modeling frameworks: 2D convolutional neural networks (CNN), 3D CNN, and recurrent neural networks (RNN). Among them, the RNN-based approach is suitable for sequential data. Thus the SR performance can be greatly improved by using the hidden states of adjacent frames. However, at each of time step in a recurrent structure, the RNN-based previous works utilize the neighboring features restrictively. Since the range of accessible motion per time step is narrow, there are still limitations to restore the missing details for dynamic or large motion. In this paper, we propose a group-based bi-directional recurrent wavelet neural networks (GBR-WNN) to exploit the sequential data and spatio-temporal information effectively for VSR. The proposed group-based bi-directional RNN (GBR) temporal modeling framework is built on the well-structured process with the group of pictures (GOP). We propose a temporal wavelet attention (TWA) module, in which attention is adopted for both spatial and temporal features. Experimental results demonstrate that the proposed method achieves superior performance compared with state-of-the-art methods in both of quantitative and qualitative evaluations.

* 10 pages, 5 figures

Via

Access Paper or Ask Questions

Deepfake Detection Scheme Based on Vision Transformer and Distillation

Apr 03, 2021

Young-Jin Heo, Young-Ju Choi, Young-Woon Lee, Byung-Gyu Kim

Figure 1 for Deepfake Detection Scheme Based on Vision Transformer and Distillation

Figure 2 for Deepfake Detection Scheme Based on Vision Transformer and Distillation

Figure 3 for Deepfake Detection Scheme Based on Vision Transformer and Distillation

Figure 4 for Deepfake Detection Scheme Based on Vision Transformer and Distillation

Abstract:Deepfake is the manipulated video made with a generative deep learning technique such as Generative Adversarial Networks (GANs) or Auto Encoder that anyone can utilize. Recently, with the increase of Deepfake videos, some classifiers consisting of the convolutional neural network that can distinguish fake videos as well as deepfake datasets have been actively created. However, the previous studies based on the CNN structure have the problem of not only overfitting, but also considerable misjudging fake video as real ones. In this paper, we propose a Vision Transformer model with distillation methodology for detecting fake videos. We design that a CNN features and patch-based positioning model learns to interact with all positions to find the artifact region for solving false negative problem. Through comparative analysis on Deepfake Detection (DFDC) Dataset, we verify that the proposed scheme with patch embedding as input outperforms the state-of-the-art using the combined CNN features. Without ensemble technique, our model obtains 0.978 of AUC and 91.9 of f1 score, while previous SOTA model yields 0.972 of AUC and 90.6 of f1 score on the same condition.

* 7 pages, 5 figures

Via

Access Paper or Ask Questions