Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Youngjoon Yoo

HyperCLOVA X Technical Report

Apr 13, 2024

Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim(+386 more)

Abstract:We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs.

* 44 pages; updated authors list and fixed author names

Via

Access Paper or Ask Questions

Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis

Jan 17, 2024

Jonghyun Lee, Hansam Cho, Youngjoon Yoo, Seoung Bum Kim, Yonghyun Jeong

Abstract:Addressing the limitations of text as a source of accurate layout representation in text-conditional diffusion models, many works incorporate additional signals to condition certain attributes within a generated image. Although successful, previous works do not account for the specific localization of said attributes extended into the three dimensional plane. In this context, we present a conditional diffusion model that integrates control over three-dimensional object placement with disentangled representations of global stylistic semantics from multiple exemplar images. Specifically, we first introduce \textit{depth disentanglement training} to leverage the relative depth of objects as an estimator, allowing the model to identify the absolute positions of unseen objects through the use of synthetic image triplets. We also introduce \textit{soft guidance}, a method for imposing global semantics onto targeted regions without the use of any additional localization cues. Our integrated framework, \textsc{Compose and Conquer (CnC)}, unifies these techniques to localize multiple conditions in a disentangled manner. We demonstrate that our approach allows perception of objects at varying depths while offering a versatile framework for composing localized objects with different global semantics. Code: https://github.com/tomtom1103/compose-and-conquer/

* ICLR 2024

Via

Access Paper or Ask Questions

Rediscovery of the Effectiveness of Standard Convolution for Lightweight Face Detection

Apr 04, 2022

Joonhyun Jeong, Beomyoung Kim, Joonsang Yu, Youngjoon Yoo

Figure 1 for Rediscovery of the Effectiveness of Standard Convolution for Lightweight Face Detection

Figure 2 for Rediscovery of the Effectiveness of Standard Convolution for Lightweight Face Detection

Figure 3 for Rediscovery of the Effectiveness of Standard Convolution for Lightweight Face Detection

Figure 4 for Rediscovery of the Effectiveness of Standard Convolution for Lightweight Face Detection

Abstract:This paper analyses the design choices of face detection architecture that improve efficiency between computation cost and accuracy. Specifically, we re-examine the effectiveness of the standard convolutional block as a lightweight backbone architecture on face detection. Unlike the current tendency of lightweight architecture design, which heavily utilizes depthwise separable convolution layers, we show that heavily channel-pruned standard convolution layer can achieve better accuracy and inference speed when using a similar parameter size. This observation is supported by the analyses concerning the characteristics of the target data domain, face. Based on our observation, we propose to employ ResNet with a highly reduced channel, which surprisingly allows high efficiency compared to other mobile-friendly networks (e.g., MobileNet-V1,-V2,-V3). From the extensive experiments, we show that the proposed backbone can replace that of the state-of-the-art face detector with a faster inference speed. Also, we further propose a new feature aggregation method maximizing the detection performance. Our proposed detector EResFD obtained 80.4% mAP on WIDER FACE Hard subset which only takes 37.7 ms for VGA image inference in on CPU. Code will be available at https://github.com/clovaai/EResFD.

Via

Access Paper or Ask Questions

Observations on K-image Expansion of Image-Mixing Augmentation for Classification

Oct 08, 2021

Joonhyun Jeong, Sungmin Cha, Youngjoon Yoo, Sangdoo Yun, Taesup Moon, Jongwon Choi

Figure 1 for Observations on K-image Expansion of Image-Mixing Augmentation for Classification

Figure 2 for Observations on K-image Expansion of Image-Mixing Augmentation for Classification

Figure 3 for Observations on K-image Expansion of Image-Mixing Augmentation for Classification

Figure 4 for Observations on K-image Expansion of Image-Mixing Augmentation for Classification

Abstract:Image-mixing augmentations (e.g., Mixup or CutMix), which typically mix two images, have become de-facto training tricks for image classification. Despite their huge success on image classification, the number of images to mix has not been profoundly investigated by the previous works, only showing the naive K-image expansion leads to poor performance degradation. This paper derives a new K-image mixing augmentation based on the stick-breaking process under Dirichlet prior. We show that our method can train more robust and generalized classifiers through extensive experiments and analysis on classification accuracy, a shape of a loss landscape and adversarial robustness, than the usual two-image methods. Furthermore, we show that our probabilistic model can measure the sample-wise uncertainty and can boost the efficiency for Network Architecture Search (NAS) with 7x reduced search time.

* Preprint

Via

Access Paper or Ask Questions

Beyond Semantic to Instance Segmentation: Weakly-Supervised Instance Segmentation via Semantic Knowledge Transfer and Self-Refinement

Sep 20, 2021

Beomyoung Kim, Youngjoon Yoo, Chaeeun Rhee, Junmo Kim

Figure 1 for Beyond Semantic to Instance Segmentation: Weakly-Supervised Instance Segmentation via Semantic Knowledge Transfer and Self-Refinement

Figure 2 for Beyond Semantic to Instance Segmentation: Weakly-Supervised Instance Segmentation via Semantic Knowledge Transfer and Self-Refinement

Figure 3 for Beyond Semantic to Instance Segmentation: Weakly-Supervised Instance Segmentation via Semantic Knowledge Transfer and Self-Refinement

Figure 4 for Beyond Semantic to Instance Segmentation: Weakly-Supervised Instance Segmentation via Semantic Knowledge Transfer and Self-Refinement

Abstract:Recent weakly-supervised semantic segmentation (WSSS) has made remarkable progress due to class-wise localization techniques using image-level labels. Meanwhile, weakly-supervised instance segmentation (WSIS) is a more challenging task because instance-wise localization using only image-level labels is quite difficult. Consequently, most WSIS approaches exploit off-the-shelf proposal technique that requires pre-training with high-level labels, deviating a fully image-level supervised setting. Moreover, we focus on semantic drift problem, $i.e.,$ missing instances in pseudo instance labels are categorized as background class, occurring confusion between background and instance in training. To this end, we propose a novel approach that consists of two innovative components. First, we design a semantic knowledge transfer to obtain pseudo instance labels by transferring the knowledge of WSSS to WSIS while eliminating the need for off-the-shelf proposals. Second, we propose a self-refinement method that refines the pseudo instance labels in a self-supervised scheme and employs them to the training in an online manner while resolving the semantic drift problem. The extensive experiments demonstrate the effectiveness of our approach, and we outperform existing works on PASCAL VOC2012 without any off-the-shelf proposal techniques. Furthermore, our approach can be easily applied to the point-supervised setting, boosting the performance with an economical annotation cost. The code will be available soon.

* 14 pages, 14 figures

Via

Access Paper or Ask Questions

SSUL: Semantic Segmentation with Unknown Label for Exemplar-based Class-Incremental Learning

Jul 01, 2021

Sungmin Cha, Beomyoung Kim, Youngjoon Yoo, Taesup Moon

Figure 1 for SSUL: Semantic Segmentation with Unknown Label for Exemplar-based Class-Incremental Learning

Figure 2 for SSUL: Semantic Segmentation with Unknown Label for Exemplar-based Class-Incremental Learning

Figure 3 for SSUL: Semantic Segmentation with Unknown Label for Exemplar-based Class-Incremental Learning

Figure 4 for SSUL: Semantic Segmentation with Unknown Label for Exemplar-based Class-Incremental Learning

Abstract:We consider a class-incremental semantic segmentation (CISS) problem. While some recently proposed algorithms utilized variants of knowledge distillation (KD) technique to tackle the problem, they only partially addressed the key additional challenges in CISS that causes the catastrophic forgetting; i.e., the semantic drift of the background class and multi-label prediction issue. To better address these challenges, we propose a new method, dubbed as SSUL-M (Semantic Segmentation with Unknown Label with Memory), by carefully combining several techniques tailored for semantic segmentation. More specifically, we make three main contributions; (1) modeling unknown class within the background class to help learning future classes (help plasticity), (2) freezing backbone network and past classifiers with binary cross-entropy loss and pseudo-labeling to overcome catastrophic forgetting (help stability), and (3) utilizing tiny exemplar memory for the first time in CISS to improve both plasticity and stability. As a result, we show our method achieves significantly better performance than the recent state-of-the-art baselines on the standard benchmark datasets. Furthermore, we justify our contributions with thorough and extensive ablation analyses and discuss different natures of the CISS problem compared to the standard class-incremental learning for classification.

Via

Access Paper or Ask Questions

Self-Supervised Iterative Contextual Smoothing for Efficient Adversarial Defense against Gray- and Black-Box Attack

Jun 22, 2021

Sungmin Cha, Naeun Ko, Youngjoon Yoo, Taesup Moon

Figure 1 for Self-Supervised Iterative Contextual Smoothing for Efficient Adversarial Defense against Gray- and Black-Box Attack

Figure 2 for Self-Supervised Iterative Contextual Smoothing for Efficient Adversarial Defense against Gray- and Black-Box Attack

Figure 3 for Self-Supervised Iterative Contextual Smoothing for Efficient Adversarial Defense against Gray- and Black-Box Attack

Figure 4 for Self-Supervised Iterative Contextual Smoothing for Efficient Adversarial Defense against Gray- and Black-Box Attack

Abstract:We propose a novel and effective input transformation based adversarial defense method against gray- and black-box attack, which is computationally efficient and does not require any adversarial training or retraining of a classification model. We first show that a very simple iterative Gaussian smoothing can effectively wash out adversarial noise and achieve substantially high robust accuracy. Based on the observation, we propose Self-Supervised Iterative Contextual Smoothing (SSICS), which aims to reconstruct the original discriminative features from the Gaussian-smoothed image in context-adaptive manner, while still smoothing out the adversarial noise. From the experiments on ImageNet, we show that our SSICS achieves both high standard accuracy and very competitive robust accuracy for the gray- and black-box attacks; e.g., transfer-based PGD-attack and score-based attack. A note-worthy point to stress is that our defense is free of computationally expensive adversarial training, yet, can approach its robust accuracy via input transformation.

* Preprint version

Via

Access Paper or Ask Questions

More than just an auxiliary loss: Anti-spoofing Backbone Training via Adversarial Pseudo-depth Generation

Jan 01, 2021

Chang Keun Paik, Naeun Ko, Youngjoon Yoo

Figure 1 for More than just an auxiliary loss: Anti-spoofing Backbone Training via Adversarial Pseudo-depth Generation

Figure 2 for More than just an auxiliary loss: Anti-spoofing Backbone Training via Adversarial Pseudo-depth Generation

Figure 3 for More than just an auxiliary loss: Anti-spoofing Backbone Training via Adversarial Pseudo-depth Generation

Figure 4 for More than just an auxiliary loss: Anti-spoofing Backbone Training via Adversarial Pseudo-depth Generation

Abstract:In this paper, a new method of training pipeline is discussed to achieve significant performance on the task of anti-spoofing with RGB image. We explore and highlight the impact of using pseudo-depth to pre-train a network that will be used as the backbone to the final classifier. While the usage of pseudo-depth for anti-spoofing task is not a new idea on its own, previous endeavours utilize pseudo-depth simply as another medium to extract features for performing prediction, or as part of many auxiliary losses in aiding the training of the main classifier, normalizing the importance of pseudo-depth as just another semantic information. Through this work, we argue that there exists a significant advantage in training the final classifier can be gained by the pre-trained generator learning to predict the corresponding pseudo-depth of a given facial image, from a Generative Adversarial Network framework. Our experimental results indicate that our method results in a much more adaptable system that can generalize beyond intra-dataset samples, but to inter-dataset samples, which it has never seen before during training. Quantitatively, our method approaches the baseline performance of the current state of the art anti-spoofing models with 15.8x less parameters used. Moreover, experiments showed that the introduced methodology performs well only using basic binary label without additional semantic information which indicates potential benefits of this work in industrial and application based environment where trade-off between additional labelling and resources are considered.

Via

Access Paper or Ask Questions

StatAssist & GradBoost: A Study on Optimal INT8 Quantization-aware Training from Scratch

Jun 17, 2020

Taehoon Kim, Youngjoon Yoo, Jihoon Yang

Figure 1 for StatAssist & GradBoost: A Study on Optimal INT8 Quantization-aware Training from Scratch

Figure 2 for StatAssist & GradBoost: A Study on Optimal INT8 Quantization-aware Training from Scratch

Figure 3 for StatAssist & GradBoost: A Study on Optimal INT8 Quantization-aware Training from Scratch

Figure 4 for StatAssist & GradBoost: A Study on Optimal INT8 Quantization-aware Training from Scratch

Abstract:This paper studies the scratch training of quantization-aware training (QAT), which has been applied to the lossless conversion of lower-bit, especially for INT8 quantization. Due to its training instability, QAT have required a full-precision (FP) pre-trained weight for fine-tuning and the performance is bound to the original FP model with floating-point computations. Here, we propose critical but straightforward optimization methods which enable the scratch training: floating-point statistic assisting (StatAssist) and stochastic-gradient boosting (GradBoost). We discovered that, first, the scratch QAT get comparable and often surpasses the performance of the floating-point counterpart without any help of the pre-trained model, especially when the model becomes complicated.We also show that our method can even train the minimax generation loss, which is very unstable and hence difficult to apply QAT fine-tuning. From extent experiments, we show that our method successfully enables QAT to train various deep models from scratch: classification, object detection, semantic segmentation, and style transfer, with comparable or often better performance than their FP baselines.

Via

Access Paper or Ask Questions

An Empirical Evaluation on Robustness and Uncertainty of Regularization Methods

Mar 09, 2020

Sanghyuk Chun, Seong Joon Oh, Sangdoo Yun, Dongyoon Han, Junsuk Choe, Youngjoon Yoo

Figure 1 for An Empirical Evaluation on Robustness and Uncertainty of Regularization Methods

Figure 2 for An Empirical Evaluation on Robustness and Uncertainty of Regularization Methods

Figure 3 for An Empirical Evaluation on Robustness and Uncertainty of Regularization Methods

Figure 4 for An Empirical Evaluation on Robustness and Uncertainty of Regularization Methods

Abstract:Despite apparent human-level performances of deep neural networks (DNN), they behave fundamentally differently from humans. They easily change predictions when small corruptions such as blur and noise are applied on the input (lack of robustness), and they often produce confident predictions on out-of-distribution samples (improper uncertainty measure). While a number of researches have aimed to address those issues, proposed solutions are typically expensive and complicated (e.g. Bayesian inference and adversarial training). Meanwhile, many simple and cheap regularization methods have been developed to enhance the generalization of classifiers. Such regularization methods have largely been overlooked as baselines for addressing the robustness and uncertainty issues, as they are not specifically designed for that. In this paper, we provide extensive empirical evaluations on the robustness and uncertainty estimates of image classifiers (CIFAR-100 and ImageNet) trained with state-of-the-art regularization methods. Furthermore, experimental results show that certain regularization methods can serve as strong baseline methods for robustness and uncertainty estimation of DNNs.

* Accepted at ICML 2019 Workshop on Uncertainty and Robustness in Deep Learning. 7 pages, 1 figure

Via

Access Paper or Ask Questions