Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shinhyeok Oh

Multi-Architecture Multi-Expert Diffusion Models

Jun 08, 2023

Yunsung Lee, Jin-Young Kim, Hyojun Go, Myeongho Jeong, Shinhyeok Oh, Seungtaek Choi

Figure 1 for Multi-Architecture Multi-Expert Diffusion Models

Figure 2 for Multi-Architecture Multi-Expert Diffusion Models

Figure 3 for Multi-Architecture Multi-Expert Diffusion Models

Figure 4 for Multi-Architecture Multi-Expert Diffusion Models

Abstract:Diffusion models have achieved impressive results in generating diverse and realistic data by employing multi-step denoising processes. However, the need for accommodating significant variations in input noise at each time-step has led to diffusion models requiring a large number of parameters for their denoisers. We have observed that diffusion models effectively act as filters for different frequency ranges at each time-step noise. While some previous works have introduced multi-expert strategies, assigning denoisers to different noise intervals, they overlook the importance of specialized operations for high and low frequencies. For instance, self-attention operations are effective at handling low-frequency components (low-pass filters), while convolutions excel at capturing high-frequency features (high-pass filters). In other words, existing diffusion models employ denoisers with the same architecture, without considering the optimal operations for each time-step noise. To address this limitation, we propose a novel approach called Multi-architecturE Multi-Expert (MEME), which consists of multiple experts with specialized architectures tailored to the operations required at each time-step interval. Through extensive experiments, we demonstrate that MEME outperforms large competitors in terms of both generation performance and computational efficiency.

Via

Access Paper or Ask Questions

Addressing Negative Transfer in Diffusion Models

Jun 01, 2023

Hyojun Go, JinYoung Kim, Yunsung Lee, Seunghyun Lee, Shinhyeok Oh, Hyeongdon Moon, Seungtaek Choi

Figure 1 for Addressing Negative Transfer in Diffusion Models

Figure 2 for Addressing Negative Transfer in Diffusion Models

Figure 3 for Addressing Negative Transfer in Diffusion Models

Figure 4 for Addressing Negative Transfer in Diffusion Models

Abstract:Diffusion-based generative models have achieved remarkable success in various domains. It trains a model on denoising tasks that encompass different noise levels simultaneously, representing a form of multi-task learning (MTL). However, analyzing and improving diffusion models from an MTL perspective remains under-explored. In particular, MTL can sometimes lead to the well-known phenomenon of $\textit{negative transfer}$, which results in the performance degradation of certain tasks due to conflicts between tasks. In this paper, we aim to analyze diffusion training from an MTL standpoint, presenting two key observations: $\textbf{(O1)}$ the task affinity between denoising tasks diminishes as the gap between noise levels widens, and $\textbf{(O2)}$ negative transfer can arise even in the context of diffusion training. Building upon these observations, our objective is to enhance diffusion training by mitigating negative transfer. To achieve this, we propose leveraging existing MTL methods, but the presence of a huge number of denoising tasks makes this computationally expensive to calculate the necessary per-task loss or gradient. To address this challenge, we propose clustering the denoising tasks into small task clusters and applying MTL methods to them. Specifically, based on $\textbf{(O2)}$, we employ interval clustering to enforce temporal proximity among denoising tasks within clusters. We show that interval clustering can be solved with dynamic programming and utilize signal-to-noise ratio, timestep, and task affinity for clustering objectives. Through this, our approach addresses the issue of negative transfer in diffusion models by allowing for efficient computation of MTL methods. We validate the proposed clustering and its integration with MTL methods through various experiments, demonstrating improved sample quality of diffusion models.

* 22 pages, 12 figures, under review

Via

Access Paper or Ask Questions

Cross Encoding as Augmentation: Towards Effective Educational Text Classification

May 31, 2023

Hyun Seung Lee, Seungtaek Choi, Yunsung Lee, Hyeongdon Moon, Shinhyeok Oh, Myeongho Jeong, Hyojun Go, Christian Wallraven

Figure 1 for Cross Encoding as Augmentation: Towards Effective Educational Text Classification

Figure 2 for Cross Encoding as Augmentation: Towards Effective Educational Text Classification

Figure 3 for Cross Encoding as Augmentation: Towards Effective Educational Text Classification

Figure 4 for Cross Encoding as Augmentation: Towards Effective Educational Text Classification

Abstract:Text classification in education, usually called auto-tagging, is the automated process of assigning relevant tags to educational content, such as questions and textbooks. However, auto-tagging suffers from a data scarcity problem, which stems from two major challenges: 1) it possesses a large tag space and 2) it is multi-label. Though a retrieval approach is reportedly good at low-resource scenarios, there have been fewer efforts to directly address the data scarcity problem. To mitigate these issues, here we propose a novel retrieval approach CEAA that provides effective learning in educational text classification. Our main contributions are as follows: 1) we leverage transfer learning from question-answering datasets, and 2) we propose a simple but effective data augmentation method introducing cross-encoder style texts to a bi-encoder architecture for more efficient inference. An extensive set of experiments shows that our proposed method is effective in multi-label scenarios and low-resource tags compared to state-of-the-art models.

* Accepted to Findings of ACL2023

Via

Access Paper or Ask Questions

Evaluation of Question Generation Needs More References

May 26, 2023

Shinhyeok Oh, Hyojun Go, Hyeongdon Moon, Yunsung Lee, Myeongho Jeong, Hyun Seung Lee, Seungtaek Choi

Abstract:Question generation (QG) is the task of generating a valid and fluent question based on a given context and the target answer. According to various purposes, even given the same context, instructors can ask questions about different concepts, and even the same concept can be written in different ways. However, the evaluation for QG usually depends on single reference-based similarity metrics, such as n-gram-based metric or learned metric, which is not sufficient to fully evaluate the potential of QG methods. To this end, we propose to paraphrase the reference question for a more robust QG evaluation. Using large language models such as GPT-3, we created semantically and syntactically diverse questions, then adopt the simple aggregation of the popular evaluation metrics as the final scores. Through our experiments, we found that using multiple (pseudo) references is more effective for QG evaluation while showing a higher correlation with human evaluations than evaluation with a single reference.

* Accepted to Findings of ACL2023

Via

Access Paper or Ask Questions

RWEN-TTS: Relation-aware Word Encoding Network for Natural Text-to-Speech Synthesis

Dec 15, 2022

Shinhyeok Oh, HyeongRae Noh, Yoonseok Hong, Insoo Oh

Abstract:With the advent of deep learning, a huge number of text-to-speech (TTS) models which produce human-like speech have emerged. Recently, by introducing syntactic and semantic information w.r.t the input text, various approaches have been proposed to enrich the naturalness and expressiveness of TTS models. Although these strategies showed impressive results, they still have some limitations in utilizing language information. First, most approaches only use graph networks to utilize syntactic and semantic information without considering linguistic features. Second, most previous works do not explicitly consider adjacent words when encoding syntactic and semantic information, even though it is obvious that adjacent words are usually meaningful when encoding the current word. To address these issues, we propose Relation-aware Word Encoding Network (RWEN), which effectively allows syntactic and semantic information based on two modules (i.e., Semantic-level Relation Encoding and Adjacent Word Relation Encoding). Experimental results show substantial improvements compared to previous works.

* Accepted to AAAI 2023

Via

Access Paper or Ask Questions

Netmarble AI Center's WMT21 Automatic Post-Editing Shared Task Submission

Sep 14, 2021

Shinhyeok Oh, Sion Jang, Hu Xu, Shounan An, Insoo Oh

Figure 1 for Netmarble AI Center's WMT21 Automatic Post-Editing Shared Task Submission

Figure 2 for Netmarble AI Center's WMT21 Automatic Post-Editing Shared Task Submission

Figure 3 for Netmarble AI Center's WMT21 Automatic Post-Editing Shared Task Submission

Figure 4 for Netmarble AI Center's WMT21 Automatic Post-Editing Shared Task Submission

Abstract:This paper describes Netmarble's submission to WMT21 Automatic Post-Editing (APE) Shared Task for the English-German language pair. First, we propose a Curriculum Training Strategy in training stages. Facebook Fair's WMT19 news translation model was chosen to engage the large and powerful pre-trained neural networks. Then, we post-train the translation model with different levels of data at each training stages. As the training stages go on, we make the system learn to solve multiple tasks by adding extra information at different training stages gradually. We also show a way to utilize the additional data in large volume for APE tasks. For further improvement, we apply Multi-Task Learning Strategy with the Dynamic Weight Average during the fine-tuning stage. To fine-tune the APE corpus with limited data, we add some related subtasks to learn a unified representation. Finally, for better performance, we leverage external translations as augmented machine translation (MT) during the post-training and fine-tuning. As experimental results show, our APE system significantly improves the translations of provided MT results by -2.848 and +3.74 on the development dataset in terms of TER and BLEU, respectively. It also demonstrates its effectiveness on the test dataset with higher quality than the development dataset.

* WMT21 Automatic Post-Editing Shared Task System Paper (at EMNLP2021 Workshop)

Via

Access Paper or Ask Questions

Deep Context- and Relation-Aware Learning for Aspect-based Sentiment Analysis

Jun 07, 2021

Shinhyeok Oh, Dongyub Lee, Taesun Whang, IlNam Park, Gaeun Seo, EungGyun Kim, Harksoo Kim

Figure 1 for Deep Context- and Relation-Aware Learning for Aspect-based Sentiment Analysis

Figure 2 for Deep Context- and Relation-Aware Learning for Aspect-based Sentiment Analysis

Figure 3 for Deep Context- and Relation-Aware Learning for Aspect-based Sentiment Analysis

Figure 4 for Deep Context- and Relation-Aware Learning for Aspect-based Sentiment Analysis

Abstract:Existing works for aspect-based sentiment analysis (ABSA) have adopted a unified approach, which allows the interactive relations among subtasks. However, we observe that these methods tend to predict polarities based on the literal meaning of aspect and opinion terms and mainly consider relations implicitly among subtasks at the word level. In addition, identifying multiple aspect-opinion pairs with their polarities is much more challenging. Therefore, a comprehensive understanding of contextual information w.r.t. the aspect and opinion are further required in ABSA. In this paper, we propose Deep Contextualized Relation-Aware Network (DCRAN), which allows interactive relations among subtasks with deep contextual information based on two modules (i.e., Aspect and Opinion Propagation and Explicit Self-Supervised Strategies). Especially, we design novel self-supervised strategies for ABSA, which have strengths in dealing with multiple aspects. Experimental results show that DCRAN significantly outperforms previous state-of-the-art methods by large margins on three widely used benchmarks.

* Accepted to ACL-IJCNLP 2021

Via

Access Paper or Ask Questions

Improving Document-Level Sentiment Classification Using Importance of Sentences

Mar 09, 2021

Gihyeon Choi, Shinhyeok Oh, Harksoo Kim

Figure 1 for Improving Document-Level Sentiment Classification Using Importance of Sentences

Figure 2 for Improving Document-Level Sentiment Classification Using Importance of Sentences

Figure 3 for Improving Document-Level Sentiment Classification Using Importance of Sentences

Figure 4 for Improving Document-Level Sentiment Classification Using Importance of Sentences

Abstract:Previous researchers have considered sentiment analysis as a document classification task, in which input documents are classified into predefined sentiment classes. Although there are sentences in a document that support important evidences for sentiment analysis and sentences that do not, they have treated the document as a bag of sentences. In other words, they have not considered the importance of each sentence in the document. To effectively determine polarity of a document, each sentence in the document should be dealt with different degrees of importance. To address this problem, we propose a document-level sentence classification model based on deep neural networks, in which the importance degrees of sentences in documents are automatically determined through gate mechanisms. To verify our new sentiment analysis model, we conducted experiments using the sentiment datasets in the four different domains such as movie reviews, hotel reviews, restaurant reviews, and music reviews. In the experiments, the proposed model outperformed previous state-of-the-art models that do not consider importance differences of sentences in a document. The experimental results show that the importance of sentences should be considered in a document-level sentiment classification task.

* Entropy, Vol.22(12), pp.1-11, 2020.11
* 12 pages, 7 figures, 5 tables

Via

Access Paper or Ask Questions