Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jongyoon Song

Know "No" Better: A Data-Driven Approach for Enhancing Negation Awareness in CLIP

Jan 19, 2025

Junsung Park, Jungbeom Lee, Jongyoon Song, Sangwon Yu, Dahuin Jung, Sungroh Yoon

Abstract:While CLIP has significantly advanced multimodal understanding by bridging vision and language, the inability to grasp negation - such as failing to differentiate concepts like "parking" from "no parking" - poses substantial challenges. By analyzing the data used in the public CLIP model's pre-training, we posit this limitation stems from a lack of negation-inclusive data. To address this, we introduce data generation pipelines that employ a large language model (LLM) and a multimodal LLM to produce negation-inclusive captions. Fine-tuning CLIP with data generated from our pipelines, we develop NegationCLIP, which enhances negation awareness while preserving the generality. Moreover, to enable a comprehensive evaluation of negation understanding, we propose NegRefCOCOg-a benchmark tailored to test VLMs' ability to interpret negation across diverse expressions and positions within a sentence. Experiments on various CLIP architectures validate the effectiveness of our data generation pipelines in enhancing CLIP's ability to perceive negation accurately. Additionally, NegationCLIP's enhanced negation awareness has practical applications across various multimodal tasks, demonstrated by performance gains in text-to-image generation and referring image segmentation.

Via

Access Paper or Ask Questions

Unleashing Multi-Hop Reasoning Potential in Large Language Models through Repetition of Misordered Context

Oct 09, 2024

Sangwon Yu, Ik-hwan Kim, Jongyoon Song, Saehyung Lee, Junsung Park, Sungroh Yoon

Figure 1 for Unleashing Multi-Hop Reasoning Potential in Large Language Models through Repetition of Misordered Context

Figure 2 for Unleashing Multi-Hop Reasoning Potential in Large Language Models through Repetition of Misordered Context

Figure 3 for Unleashing Multi-Hop Reasoning Potential in Large Language Models through Repetition of Misordered Context

Figure 4 for Unleashing Multi-Hop Reasoning Potential in Large Language Models through Repetition of Misordered Context

Abstract:Multi-hop reasoning, which requires multi-step reasoning based on the supporting documents within a given context, remains challenging for large language models (LLMs). LLMs often struggle to filter out irrelevant documents within the context, and their performance is sensitive to the position of supporting documents within that context. In this paper, we identify an additional challenge: LLMs' performance is also sensitive to the order in which the supporting documents are presented. We refer to this as the misordered context problem. To address this issue, we propose a simple yet effective method called context repetition (CoRe), which involves prompting the model by repeatedly presenting the context to ensure the supporting documents are presented in the optimal order for the model. Using CoRe, we improve the F1 score by up to 30%p on multi-hop QA tasks and increase accuracy by up to 70%p on a synthetic task. Additionally, CoRe helps mitigate the well-known "lost-in-the-middle" problem in LLMs and can be effectively combined with retrieval-based approaches utilizing Chain-of-Thought (CoT) reasoning.

Via

Access Paper or Ask Questions

Correcting Negative Bias in Large Language Models through Negative Attention Score Alignment

Jul 31, 2024

Sangwon Yu, Jongyoon Song, Bongkyu Hwang, Hoyoung Kang, Sooah Cho, Junhwa Choi, Seongho Joe, Taehee Lee, Youngjune L. Gwon, Sungroh Yoon

Figure 1 for Correcting Negative Bias in Large Language Models through Negative Attention Score Alignment

Figure 2 for Correcting Negative Bias in Large Language Models through Negative Attention Score Alignment

Figure 3 for Correcting Negative Bias in Large Language Models through Negative Attention Score Alignment

Figure 4 for Correcting Negative Bias in Large Language Models through Negative Attention Score Alignment

Abstract:A binary decision task, like yes-no questions or answer verification, reflects a significant real-world scenario such as where users look for confirmation about the correctness of their decisions on specific issues. In this work, we observe that language models exhibit a negative bias in the binary decisions of complex reasoning tasks. Based on our observations and the rationale about attention-based model dynamics, we propose a negative attention score (NAS) to systematically and quantitatively formulate negative bias. Based on NAS, we identify attention heads that attend to negative tokens provided in the instructions as answer candidate of binary decisions, regardless of the question in the prompt, and validate their association with the negative bias. Additionally, we propose the negative attention score alignment (NASA) method, which is a parameter-efficient fine-tuning technique to address the extracted negatively biased attention heads. Experimental results from various domains of reasoning tasks and large model search space demonstrate that NASA significantly reduces the gap between precision and recall caused by negative bias while preserving their generalization abilities. Our codes are available at \url{https://github.com/ysw1021/NASA}.

Via

Access Paper or Ask Questions

Large Language Models are Skeptics: False Negative Problem of Input-conflicting Hallucination

Jun 20, 2024

Jongyoon Song, Sangwon Yu, Sungroh Yoon

Abstract:In this paper, we identify a new category of bias that induces input-conflicting hallucinations, where large language models (LLMs) generate responses inconsistent with the content of the input context. This issue we have termed the false negative problem refers to the phenomenon where LLMs are predisposed to return negative judgments when assessing the correctness of a statement given the context. In experiments involving pairs of statements that contain the same information but have contradictory factual directions, we observe that LLMs exhibit a bias toward false negatives. Specifically, the model presents greater overconfidence when responding with False. Furthermore, we analyze the relationship between the false negative problem and context and query rewriting and observe that both effectively tackle false negatives in LLMs.

* 12 pages, 9 figures

Via

Access Paper or Ask Questions

Entity-level Factual Adaptiveness of Fine-tuning based Abstractive Summarization Models

Feb 23, 2024

Jongyoon Song, Nohil Park, Bongkyu Hwang, Jaewoong Yun, Seongho Joe, Youngjune L. Gwon, Sungroh Yoon

Figure 1 for Entity-level Factual Adaptiveness of Fine-tuning based Abstractive Summarization Models

Figure 2 for Entity-level Factual Adaptiveness of Fine-tuning based Abstractive Summarization Models

Figure 3 for Entity-level Factual Adaptiveness of Fine-tuning based Abstractive Summarization Models

Figure 4 for Entity-level Factual Adaptiveness of Fine-tuning based Abstractive Summarization Models

Abstract:Abstractive summarization models often generate factually inconsistent content particularly when the parametric knowledge of the model conflicts with the knowledge in the input document. In this paper, we analyze the robustness of fine-tuning based summarization models to the knowledge conflict, which we call factual adaptiveness. We utilize pre-trained language models to construct evaluation sets and find that factual adaptiveness is not strongly correlated with factual consistency on original datasets. Furthermore, we introduce a controllable counterfactual data augmentation method where the degree of knowledge conflict within the augmented data can be adjustable. Our experimental results on two pre-trained language models (PEGASUS and BART) and two fine-tuning datasets (XSum and CNN/DailyMail) demonstrate that our method enhances factual adaptiveness while achieving factual consistency on original datasets on par with the contrastive learning baseline.

* EACL 2024

Via

Access Paper or Ask Questions

AligNART: Non-autoregressive Neural Machine Translation by Jointly Learning to Estimate Alignment and Translate

Sep 14, 2021

Jongyoon Song, Sungwon Kim, Sungroh Yoon

Figure 1 for AligNART: Non-autoregressive Neural Machine Translation by Jointly Learning to Estimate Alignment and Translate

Figure 2 for AligNART: Non-autoregressive Neural Machine Translation by Jointly Learning to Estimate Alignment and Translate

Figure 3 for AligNART: Non-autoregressive Neural Machine Translation by Jointly Learning to Estimate Alignment and Translate

Figure 4 for AligNART: Non-autoregressive Neural Machine Translation by Jointly Learning to Estimate Alignment and Translate

Abstract:Non-autoregressive neural machine translation (NART) models suffer from the multi-modality problem which causes translation inconsistency such as token repetition. Most recent approaches have attempted to solve this problem by implicitly modeling dependencies between outputs. In this paper, we introduce AligNART, which leverages full alignment information to explicitly reduce the modality of the target distribution. AligNART divides the machine translation task into $(i)$ alignment estimation and $(ii)$ translation with aligned decoder inputs, guiding the decoder to focus on simplified one-to-one translation. To alleviate the alignment estimation problem, we further propose a novel alignment decomposition method. Our experiments show that AligNART outperforms previous non-iterative NART models that focus on explicit modality reduction on WMT14 En$\leftrightarrow$De and WMT16 Ro$\rightarrow$En. Furthermore, AligNART achieves BLEU scores comparable to those of the state-of-the-art connectionist temporal classification based models on WMT14 En$\leftrightarrow$De. We also observe that AligNART effectively addresses the token repetition problem even without sequence-level knowledge distillation.

* Accepted by EMNLP 2021

Via

Access Paper or Ask Questions

Rare Words Degenerate All Words

Sep 07, 2021

Sangwon Yu, Jongyoon Song, Heeseung Kim, Seong-min Lee, Woo-Jong Ryu, Sungroh Yoon

Figure 1 for Rare Words Degenerate All Words

Figure 2 for Rare Words Degenerate All Words

Figure 3 for Rare Words Degenerate All Words

Figure 4 for Rare Words Degenerate All Words

Abstract:Despite advances in neural network language model, the representation degeneration problem of embeddings is still challenging. Recent studies have found that the learned output embeddings are degenerated into a narrow-cone distribution which makes the similarity between each embeddings positive. They analyzed the cause of the degeneration problem has been demonstrated as common to most embeddings. However, we found that the degeneration problem is especially originated from the training of embeddings of rare words. In this study, we analyze the intrinsic mechanism of the degeneration of rare word embeddings with respect of their gradient about the negative log-likelihood loss function. Furthermore, we theoretically and empirically demonstrate that the degeneration of rare word embeddings causes the degeneration of non-rare word embeddings, and that the overall degeneration problem can be alleviated by preventing the degeneration of rare word embeddings. Based on our analyses, we propose a novel method, Adaptive Gradient Partial Scaling(AGPS), to address the degeneration problem. Experimental results demonstrate the effectiveness of the proposed method qualitatively and quantitatively.

Via

Access Paper or Ask Questions

Deep Learning on Key Performance Indicators for Predictive Maintenance in SAP HANA

Apr 16, 2018

Jaekoo Lee, Byunghan Lee, Jongyoon Song, Jaesik Yoon, Yongsik Lee, Donghun Lee, Sungroh Yoon

Figure 1 for Deep Learning on Key Performance Indicators for Predictive Maintenance in SAP HANA

Figure 2 for Deep Learning on Key Performance Indicators for Predictive Maintenance in SAP HANA

Figure 3 for Deep Learning on Key Performance Indicators for Predictive Maintenance in SAP HANA

Figure 4 for Deep Learning on Key Performance Indicators for Predictive Maintenance in SAP HANA

Abstract:With a new era of cloud and big data, Database Management Systems (DBMSs) have become more crucial in numerous enterprise business applications in all the industries. Accordingly, the importance of their proactive and preventive maintenance has also increased. However, detecting problems by predefined rules or stochastic modeling has limitations, particularly when analyzing the data on high-dimensional Key Performance Indicators (KPIs) from a DBMS. In recent years, Deep Learning (DL) has opened new opportunities for this complex analysis. In this paper, we present two complementary DL approaches to detect anomalies in SAP HANA. A temporal learning approach is used to detect abnormal patterns based on unlabeled historical data, whereas a spatial learning approach is used to classify known anomalies based on labeled data. We implement a system in SAP HANA integrated with Google TensorFlow. The experimental results with real-world data confirm the effectiveness of the system and models.

* This version withdrawn by arXiv administrators because the author did not have the right to agree to our license at the time of submission

Via

Access Paper or Ask Questions

Building a Neural Machine Translation System Using Only Synthetic Parallel Data

Sep 17, 2017

Jaehong Park, Jongyoon Song, Sungroh Yoon

Figure 1 for Building a Neural Machine Translation System Using Only Synthetic Parallel Data

Figure 2 for Building a Neural Machine Translation System Using Only Synthetic Parallel Data

Figure 3 for Building a Neural Machine Translation System Using Only Synthetic Parallel Data

Figure 4 for Building a Neural Machine Translation System Using Only Synthetic Parallel Data

Abstract:Recent works have shown that synthetic parallel data automatically generated by translation models can be effective for various neural machine translation (NMT) issues. In this study, we build NMT systems using only synthetic parallel data. As an efficient alternative to real parallel data, we also present a new type of synthetic parallel corpus. The proposed pseudo parallel data are distinct from previous works in that ground truth and synthetic examples are mixed on both sides of sentence pairs. Experiments on Czech-German and French-German translations demonstrate the efficacy of the proposed pseudo parallel corpus, which shows not only enhanced results for bidirectional translation tasks but also substantial improvement with the aid of a ground truth real parallel corpus.

Via

Access Paper or Ask Questions