Abstract:Camouflaged object detection (COD) aims to generate a fine-grained segmentation map of camouflaged objects hidden in their background. Due to the hidden nature of camouflaged objects, it is essential for the decoder to be tailored to effectively extract proper features of camouflaged objects and extra-carefully generate their complex boundaries. In this paper, we propose a novel architecture that augments the prevalent decoding strategy in COD with Enrich Decoder and Retouch Decoder, which help to generate a fine-grained segmentation map. Specifically, the Enrich Decoder amplifies the channels of features that are important for COD using channel-wise attention. Retouch Decoder further refines the segmentation maps by spatially attending to important pixels, such as the boundary regions. With extensive experiments, we demonstrate that ENTO shows superior performance using various encoders, with the two novel components playing their unique roles that are mutually complementary.
Abstract:Understanding user satisfaction with conversational systems, known as User Satisfaction Estimation (USE), is essential for assessing dialogue quality and enhancing user experiences. However, existing methods for USE face challenges due to limited understanding of underlying reasons for user dissatisfaction and the high costs of annotating user intentions. To address these challenges, we propose PRAISE (Plan and Retrieval Alignment for Interpretable Satisfaction Estimation), an interpretable framework for effective user satisfaction prediction. PRAISE operates through three key modules. The Strategy Planner develops strategies, which are natural language criteria for classifying user satisfaction. The Feature Retriever then incorporates knowledge on user satisfaction from Large Language Models (LLMs) and retrieves relevance features from utterances. Finally, the Score Analyzer evaluates strategy predictions and classifies user satisfaction. Experimental results demonstrate that PRAISE achieves state-of-the-art performance on three benchmarks for the USE task. Beyond its superior performance, PRAISE offers additional benefits. It enhances interpretability by providing instance-level explanations through effective alignment of utterances with strategies. Moreover, PRAISE operates more efficiently than existing approaches by eliminating the need for LLMs during the inference phase.
Abstract:The increasing demand for mental health services has led to the rise of AI-driven mental health chatbots, though challenges related to privacy, data collection, and expertise persist. Motivational Interviewing (MI) is gaining attention as a theoretical basis for boosting expertise in the development of these chatbots. However, existing datasets are showing limitations for training chatbots, leading to a substantial demand for publicly available resources in the field of MI and psychotherapy. These challenges are even more pronounced in non-English languages, where they receive less attention. In this paper, we propose a novel framework that simulates MI sessions enriched with the expertise of professional therapists. We train an MI forecaster model that mimics the behavioral choices of professional therapists and employ Large Language Models (LLMs) to generate utterances through prompt engineering. Then, we present KMI, the first synthetic dataset theoretically grounded in MI, containing 1,000 high-quality Korean Motivational Interviewing dialogues. Through an extensive expert evaluation of the generated dataset and the dialogue model trained on it, we demonstrate the quality, expertise, and practicality of KMI. We also introduce novel metrics derived from MI theory in order to evaluate dialogues from the perspective of MI.
Abstract:Recency bias in a sequential recommendation system refers to the overly high emphasis placed on recent items within a user session. This bias can diminish the serendipity of recommendations and hinder the system's ability to capture users' long-term interests, leading to user disengagement. We propose a simple yet effective novel metric specifically designed to quantify recency bias. Our findings also demonstrate that high recency bias measured in our proposed metric adversely impacts recommendation performance too, and mitigating it results in improved recommendation performances across all models evaluated in our experiments, thus highlighting the importance of measuring recency bias.
Abstract:Despite the impressive capabilities of Large Language Models (LLMs) in various tasks, their vulnerability to unsafe prompts remains a critical issue. These prompts can lead LLMs to generate responses on illegal or sensitive topics, posing a significant threat to their safe and ethical use. Existing approaches attempt to address this issue using classification models, but they have several drawbacks. With the increasing complexity of unsafe prompts, similarity search-based techniques that identify specific features of unsafe prompts provide a more robust and effective solution to this evolving problem. This paper investigates the potential of sentence encoders to distinguish safe from unsafe prompts, and the ability to classify various unsafe prompts according to a safety taxonomy. We introduce new pairwise datasets and the Categorical Purity (CP) metric to measure this capability. Our findings reveal both the effectiveness and limitations of existing sentence encoders, proposing directions to improve sentence encoders to operate as more robust safety detectors. Our code is available at https://github.com/JwdanielJung/Safe-Embed.
Abstract:Difference visual question answering (diff-VQA) is a challenging task that requires answering complex questions based on differences between a pair of images. This task is particularly important in reading chest X-ray images because radiologists often compare multiple images of the same patient taken at different times to track disease progression and changes in its severity in their clinical practice. However, previous works focused on designing specific network architectures for the diff-VQA task, missing opportunities to enhance the model's performance using a pretrained vision-language model (VLM). Here, we introduce a novel VLM called PLURAL, which is pretrained on natural and longitudinal chest X-ray data for the diff-VQA task. The model is developed using a step-by-step approach, starting with being pretrained on natural images and texts, followed by being trained using longitudinal chest X-ray data. The longitudinal data consist of pairs of X-ray images, along with question-answer sets and radiologist's reports that describe the changes in lung abnormalities and diseases over time. Our experimental results show that the PLURAL model outperforms state-of-the-art methods not only in diff-VQA for longitudinal X-rays but also in conventional VQA for a single X-ray image. Through extensive experiments, we demonstrate the effectiveness of the proposed VLM architecture and pretraining method in improving the model's performance.
Abstract:Detecting anomalies in real-world multivariate time series data is challenging due to complex temporal dependencies and inter-variable correlations. Recently, reconstruction-based deep models have been widely used to solve the problem. However, these methods still suffer from an over-generalization issue and fail to deliver consistently high performance. To address this issue, we propose the MEMTO, a memory-guided Transformer using a reconstruction-based approach. It is designed to incorporate a novel memory module that can learn the degree to which each memory item should be updated in response to the input data. To stabilize the training procedure, we use a two-phase training paradigm which involves using K-means clustering for initializing memory items. Additionally, we introduce a bi-dimensional deviation-based detection criterion that calculates anomaly scores considering both input space and latent space. We evaluate our proposed method on five real-world datasets from diverse domains, and it achieves an average anomaly detection F1-score of 95.74%, significantly outperforming the previous state-of-the-art methods. We also conduct extensive experiments to empirically validate the effectiveness of our proposed model's key components.
Abstract:Discriminating the matched named entity pairs or identifying the entities' canonical forms are critical in text mining tasks. More precise named entity normalization in text mining will benefit other subsequent text analytic applications. We built the named entity normalization model with a novel Edge Weight Updating Neural Network. Our proposed model when tested on four different datasets achieved state-of-the-art results. We, next, verify our model's performance on NCBI Disease, BC5CDR Disease, and BC5CDR Chemical databases, which are widely used named entity normalization datasets in the bioinformatics field. We also tested our model with our own financial named entity normalization dataset to validate the efficacy for more general applications. Using the constructed dataset, we differentiate named entity pairs. Our model achieved the highest named entity normalization performances in terms of various evaluation metrics.
Abstract:Increasing attention has been drawn to the sentiment analysis of financial documents. The most popular examples of such documents include analyst reports and economic news, the analysis of which is frequently used to capture the trends in market sentiments. On the other hand, the significance of the role sentiment analysis plays in the financial domain has given rise to the efforts to construct a financial domain-specific sentiment lexicon. Sentiment lexicons lend a hand for solving various text mining tasks, such as unsupervised classification of text data, while alleviating the arduous human labor required for manual labeling. One of the challenges in the construction of an effective sentiment lexicon is that the semantic orientation of a word may change depending on the context in which it appears. For instance, the word ``profit" usually conveys positive sentiments; however, when the word is juxtaposed with another word ``decrease," the sentiment associated with the phrase ``profit decreases" now becomes negative. Hence, the sentiment of a given word may shift as one begins to consider the context surrounding the word. In this paper, we address this issue by incorporating context when building sentiment lexicon from a given corpus. Specifically, we construct a lexicon named Senti-DD for the Sentiment lexicon composed of Direction-Dependent words, which expresses each term a pair of a directional word and a direction-dependent word. Experiment results show that higher classification performance is achieved with Senti-DD, proving the effectiveness of our method for automatically constructing a context-aware sentiment lexicon in the financial domain.
Abstract:This paper proposes an attention module augmented relational network called SARN(Sequential Attention Relational Network) that can carry out relational reasoning by extracting reference objects and making efficient pairing between objects. SARN greatly reduces the computational and memory requirements of the relational network, which computes all object pairs. It also shows high accuracy on the Sort-of-CLEVR dataset compared to other models, especially on relational questions.