Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sungchul Choi

Multi-LLM Collaborative Caption Generation in Scientific Documents

Jan 05, 2025

Jaeyoung Kim, Jongho Lee, Hong-Jun Choi, Ting-Yao Hsu, Chieh-Yang Huang, Sungchul Kim, Ryan Rossi, Tong Yu, Clyde Lee Giles, Ting-Hao 'Kenneth' Huang(+1 more)

Figure 1 for Multi-LLM Collaborative Caption Generation in Scientific Documents

Figure 2 for Multi-LLM Collaborative Caption Generation in Scientific Documents

Figure 3 for Multi-LLM Collaborative Caption Generation in Scientific Documents

Figure 4 for Multi-LLM Collaborative Caption Generation in Scientific Documents

Abstract:Scientific figure captioning is a complex task that requires generating contextually appropriate descriptions of visual content. However, existing methods often fall short by utilizing incomplete information, treating the task solely as either an image-to-text or text summarization problem. This limitation hinders the generation of high-quality captions that fully capture the necessary details. Moreover, existing data sourced from arXiv papers contain low-quality captions, posing significant challenges for training large language models (LLMs). In this paper, we introduce a framework called Multi-LLM Collaborative Figure Caption Generation (MLBCAP) to address these challenges by leveraging specialized LLMs for distinct sub-tasks. Our approach unfolds in three key modules: (Quality Assessment) We utilize multimodal LLMs to assess the quality of training data, enabling the filtration of low-quality captions. (Diverse Caption Generation) We then employ a strategy of fine-tuning/prompting multiple LLMs on the captioning task to generate candidate captions. (Judgment) Lastly, we prompt a prominent LLM to select the highest quality caption from the candidates, followed by refining any remaining inaccuracies. Human evaluations demonstrate that informative captions produced by our approach rank better than human-written captions, highlighting its effectiveness. Our code is available at https://github.com/teamreboott/MLBCAP

* Accepted to AAAI 2025 AI4Research Workshop

Via

Access Paper or Ask Questions

DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling

Sep 25, 2024

Kyuheon Jung, Yongdeuk Seo, Seongwoo Cho, Jaeyoung Kim, Hyun-seok Min, Sungchul Choi

Figure 1 for DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling

Figure 2 for DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling

Figure 3 for DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling

Figure 4 for DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling

Abstract:In this paper, we present an effective data augmentation framework leveraging the Large Language Model (LLM) and Diffusion Model (DM) to tackle the challenges inherent in data-scarce scenarios. Recently, DMs have opened up the possibility of generating synthetic images to complement a few training images. However, increasing the diversity of synthetic images also raises the risk of generating samples outside the target distribution. Our approach addresses this issue by embedding novel semantic information into text prompts via LLM and utilizing real images as visual prompts, thus generating semantically rich images. To ensure that the generated images remain within the target distribution, we dynamically adjust the guidance weight based on each image's CLIPScore to control the diversity. Experimental results show that our method produces synthetic images with enhanced diversity while maintaining adherence to the target distribution. Consequently, our approach proves to be more efficient in the few-shot setting on several benchmarks. Our code is available at https://github.com/kkyuhun94/dalda .

* Accepted to ECCV Synthetic Data for Computer Vision Workshop (Oral)

Via

Access Paper or Ask Questions

Pseudo Outlier Exposure for Out-of-Distribution Detection using Pretrained Transformers

Jul 19, 2023

Jaeyoung Kim, Kyuheon Jung, Dongbin Na, Sion Jang, Eunbin Park, Sungchul Choi

Abstract:For real-world language applications, detecting an out-of-distribution (OOD) sample is helpful to alert users or reject such unreliable samples. However, modern over-parameterized language models often produce overconfident predictions for both in-distribution (ID) and OOD samples. In particular, language models suffer from OOD samples with a similar semantic representation to ID samples since these OOD samples lie near the ID manifold. A rejection network can be trained with ID and diverse outlier samples to detect test OOD samples, but explicitly collecting auxiliary OOD datasets brings an additional burden for data collection. In this paper, we propose a simple but effective method called Pseudo Outlier Exposure (POE) that constructs a surrogate OOD dataset by sequentially masking tokens related to ID classes. The surrogate OOD sample introduced by POE shows a similar representation to ID data, which is most effective in training a rejection network. Our method does not require any external OOD data and can be easily implemented within off-the-shelf Transformers. A comprehensive comparison with state-of-the-art algorithms demonstrates POE's competitiveness on several text classification benchmarks.

* Findings of the Association for Computational Linguistics: ACL 2023 (2023) 1469-1482
* 12 pages, 2 figures

Via

Access Paper or Ask Questions

Bag of Tricks for In-Distribution Calibration of Pretrained Transformers

Feb 13, 2023

Jaeyoung Kim, Dongbin Na, Sungchul Choi, Sungbin Lim

Figure 1 for Bag of Tricks for In-Distribution Calibration of Pretrained Transformers

Figure 2 for Bag of Tricks for In-Distribution Calibration of Pretrained Transformers

Figure 3 for Bag of Tricks for In-Distribution Calibration of Pretrained Transformers

Figure 4 for Bag of Tricks for In-Distribution Calibration of Pretrained Transformers

Abstract:While pre-trained language models (PLMs) have become a de-facto standard promoting the accuracy of text classification tasks, recent studies find that PLMs often predict over-confidently. Although various calibration methods have been proposed, such as ensemble learning and data augmentation, most of the methods have been verified in computer vision benchmarks rather than in PLM-based text classification tasks. In this paper, we present an empirical study on confidence calibration for PLMs, addressing three categories, including confidence penalty losses, data augmentations, and ensemble methods. We find that the ensemble model overfitted to the training set shows sub-par calibration performance and also observe that PLMs trained with confidence penalty loss have a trade-off between calibration and accuracy. Building on these observations, we propose the Calibrated PLM (CALL), a combination of calibration techniques. The CALL complements the drawbacks that may occur when utilizing a calibration method individually and boosts both classification and calibration accuracy. Design choices in CALL's training procedures are extensively studied, and we provide a detailed analysis of how calibration techniques affect the calibration performance of PLMs.

Via

Access Paper or Ask Questions

Deep learning-based citation recommendation system for patents

Oct 21, 2020

Jaewoong Choi, Sion Jang, Jaeyoung Kim, Jiho Lee, Janghyeok Yoona, Sungchul Choi

Figure 1 for Deep learning-based citation recommendation system for patents

Figure 2 for Deep learning-based citation recommendation system for patents

Figure 3 for Deep learning-based citation recommendation system for patents

Figure 4 for Deep learning-based citation recommendation system for patents

Abstract:In this study, we address the challenges in developing a deep learning-based automatic patent citation recommendation system. Although deep learning-based recommendation systems have exhibited outstanding performance in various domains (such as movies, products, and paper citations), their validity in patent citations has not been investigated, owing to the lack of a freely available high-quality dataset and relevant benchmark model. To solve these problems, we present a novel dataset called PatentNet that includes textual information and metadata for approximately 110,000 patents from the Google Big Query service. Further, we propose strong benchmark models considering the similarity of textual information and metadata (such as cooperative patent classification code). Compared with existing recommendation methods, the proposed benchmark method achieved a mean reciprocal rank of 0.2377 on the test set, whereas the existing state-of-the-art recommendation method achieved 0.2073.

Via

Access Paper or Ask Questions

Machine-Learning Approach to Analyze the Status of Forklift Vehicles with Irregular Movement in a Shipyard

Oct 12, 2020

Hyeonju Lee, Jongho Lee, Minji An, Gunil Park, Sungchul Choi

Figure 1 for Machine-Learning Approach to Analyze the Status of Forklift Vehicles with Irregular Movement in a Shipyard

Figure 2 for Machine-Learning Approach to Analyze the Status of Forklift Vehicles with Irregular Movement in a Shipyard

Figure 3 for Machine-Learning Approach to Analyze the Status of Forklift Vehicles with Irregular Movement in a Shipyard

Figure 4 for Machine-Learning Approach to Analyze the Status of Forklift Vehicles with Irregular Movement in a Shipyard

Abstract:In large shipyards, the management of equipment, which are used for building a variety of ships, is critical. Because orders vary year to year, shipyard managers are required to determine methods to make the most of their limited resources. A particular difficulty that arises because of the nature and size of shipyards is the management of moving vehicles. In recent years, shipbuilding companies have attempted to manage and track the locations and movements of vehicles using Global Positioning System (GPS) modules. However, because certain vehicles, such as forklifts, roam irregularly around a yard, identifying their working status without being onsite is difficult. Location information alone is not sufficient to determine whether a vehicle is working, moving, waiting, or resting. This study proposes an approach based on machine learning to identify the work status of each forklift. We use the DBSCAN and k-means algorithms to identify the area in which a particular forklift is operating and the type of work it is performing. We developed a business intelligence system to collect information from forklifts equipped with GPS and Internet of Things (IoT) devices. The system provides visual information on the status of individual forklifts and helps in the efficient management of their movements within large shipyards.

* I withdraw this paper because an error in the experiment has been found

Via

Access Paper or Ask Questions

A Context-Aware Citation Recommendation Model with BERT and Graph Convolutional Networks

Mar 15, 2019

Chanwoo Jeong, Sion Jang, Hyuna Shin, Eunjeong Park, Sungchul Choi

Figure 1 for A Context-Aware Citation Recommendation Model with BERT and Graph Convolutional Networks

Figure 2 for A Context-Aware Citation Recommendation Model with BERT and Graph Convolutional Networks

Figure 3 for A Context-Aware Citation Recommendation Model with BERT and Graph Convolutional Networks

Figure 4 for A Context-Aware Citation Recommendation Model with BERT and Graph Convolutional Networks

Abstract:With the tremendous growth in the number of scientific papers being published, searching for references while writing a scientific paper is a time-consuming process. A technique that could add a reference citation at the appropriate place in a sentence will be beneficial. In this perspective, context-aware citation recommendation has been researched upon for around two decades. Many researchers have utilized the text data called the context sentence, which surrounds the citation tag, and the metadata of the target paper to find the appropriate cited research. However, the lack of well-organized benchmarking datasets and no model that can attain high performance has made the research difficult. In this paper, we propose a deep learning based model and well-organized dataset for context-aware paper citation recommendation. Our model comprises a document encoder and a context encoder, which uses Graph Convolutional Networks (GCN) layer and Bidirectional Encoder Representations from Transformers (BERT), which is a pre-trained model of textual data. By modifying the related PeerRead dataset, we propose a new dataset called FullTextPeerRead containing context sentences to cited references and paper metadata. To the best of our knowledge, This dataset is the first well-organized dataset for context-aware paper recommendation. The results indicate that the proposed model with the proposed datasets can attain state-of-the-art performance and achieve a more than 28% improvement in mean average precision (MAP) and recall@k.

* 7 pages, 5 figures

Via

Access Paper or Ask Questions

A Deep Patent Landscaping Model using Transformer and Graph Convolutional Network

Mar 14, 2019

Seokkyu Choi, Hyeonju Lee, Eunjeong Lucy Park, Sungchul Choi

Figure 1 for A Deep Patent Landscaping Model using Transformer and Graph Convolutional Network

Figure 2 for A Deep Patent Landscaping Model using Transformer and Graph Convolutional Network

Figure 3 for A Deep Patent Landscaping Model using Transformer and Graph Convolutional Network

Figure 4 for A Deep Patent Landscaping Model using Transformer and Graph Convolutional Network

Abstract:Patent landscaping is a method that is employed for searching related patents during the process of a research and development (R&D) project. To avoid the risk of patent infringement and to follow the current trends of technology development, patent landscaping is a crucial task that needs to be conducted during the early stages of an R&D project. Generally, the process of patent landscaping requires several advanced resources and can be tedious. Furthermore, the patent landscaping process has to be repeated throughout the duration of an R&D project. Owing to such reasons, the demand for automated patent landscaping is gradually increasing. However, the shortage of well-defined benchmarking datasets and comparable models makes it difficult to find related research studies. In this paper, an automated patent landscaping model based on deep learning is proposed. The proposed model comprises a modified transformer structure for analyzing textual data present in patent documents and a graph convolutional network for analyzing patent metadata. Twelve patent landscaping benchmarking datasets, which were processed by the Korean patent attorney, are proposed for determining the resources required for comparing related research studies. Obtained results indicate that the proposed model with the proposed datasets can attain state-of-the-art performance , and mean classification accuracy of 98% can be achieved.

* 7 pages, 6 figures

Via

Access Paper or Ask Questions

Hybrid Machine Learning Approach to Popularity Prediction of Newly Released Contents for Online Video Streaming Service

Jan 28, 2019

Hongjun Jeon, Wonchul Seo, Eunjeong Lucy Park, Sungchul Choi

Figure 1 for Hybrid Machine Learning Approach to Popularity Prediction of Newly Released Contents for Online Video Streaming Service

Figure 2 for Hybrid Machine Learning Approach to Popularity Prediction of Newly Released Contents for Online Video Streaming Service

Figure 3 for Hybrid Machine Learning Approach to Popularity Prediction of Newly Released Contents for Online Video Streaming Service

Figure 4 for Hybrid Machine Learning Approach to Popularity Prediction of Newly Released Contents for Online Video Streaming Service

Abstract:In the industry of video content providers such as VOD and IPTV, predicting the popularity of video contents in advance is critical not only from a marketing perspective but also from a network optimization perspective. By predicting whether the content will be successful or not in advance, the content file, which is large, is efficiently deployed in the proper service providing server, leading to network cost optimization. Many previous studies have done view count prediction research to do this. However, the studies have been making predictions based on historical view count data from users. In this case, the contents had been published to the users and already deployed on a service server. These approaches make possible to efficiently deploy a content already published but are impossible to use for a content that is not be published. To address the problems, this research proposes a hybrid machine learning approach to the classification model for the popularity prediction of newly video contents which is not published. In this paper, we create a new variable based on the related content of the specific content and divide entire dataset by the characteristics of the contents. Next, the prediction is performed using XGBoosting and deep neural net based model according to the data characteristics of the cluster. Our model uses metadata for contents for prediction, so we use categorical embedding techniques to solve the sparsity of categorical variables and make them learn efficiently for the deep neural net model. As well, we use the FTRL-proximal algorithm to solve the problem of the view-count volatility of video content. We achieve overall better performance than the previous standalone method with a dataset from one of the top streaming service company.

Via

Access Paper or Ask Questions

Text Classification using Capsules

Aug 14, 2018

Jaeyoung Kim, Sion Jang, Sungchul Choi, Eunjeong Park

Figure 1 for Text Classification using Capsules

Figure 2 for Text Classification using Capsules

Figure 3 for Text Classification using Capsules

Figure 4 for Text Classification using Capsules

Abstract:This paper presents an empirical exploration of the use of capsule networks for text classification. While it has been shown that capsule networks are effective for image classification, their validity in the domain of text has not been explored. In this paper, we show that capsule networks indeed have the potential for text classification and that they have several advantages over convolutional neural networks. We further suggest a simple routing method that effectively reduces the computational complexity of dynamic routing. We utilized seven benchmark datasets to demonstrate that capsule networks, along with the proposed routing method provide comparable results.

Via

Access Paper or Ask Questions