Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jaeyoon Jung

Hypothetical Documents or Knowledge Leakage? Rethinking LLM-based Query Expansion

Apr 19, 2025

Yejun Yoon, Jaeyoon Jung, Seunghyun Yoon, Kunwoo Park

Abstract:Query expansion methods powered by large language models (LLMs) have demonstrated effectiveness in zero-shot retrieval tasks. These methods assume that LLMs can generate hypothetical documents that, when incorporated into a query vector, enhance the retrieval of real evidence. However, we challenge this assumption by investigating whether knowledge leakage in benchmarks contributes to the observed performance gains. Using fact verification as a testbed, we analyzed whether the generated documents contained information entailed by ground truth evidence and assessed their impact on performance. Our findings indicate that performance improvements occurred consistently only for claims whose generated documents included sentences entailed by ground truth evidence. This suggests that knowledge leakage may be present in these benchmarks, inflating the perceived performance of LLM-based query expansion methods, particularly in real-world scenarios that require retrieving niche or novel knowledge.

* preprint

Via

Access Paper or Ask Questions

KOFFVQA: An Objectively Evaluated Free-form VQA Benchmark for Large Vision-Language Models in the Korean Language

Mar 31, 2025

Yoonshik Kim, Jaeyoon Jung

Abstract:The recent emergence of Large Vision-Language Models(VLMs) has resulted in a variety of different benchmarks for evaluating such models. Despite this, we observe that most existing evaluation methods suffer from the fact that they either require the model to choose from pre-determined responses, sacrificing open-endedness, or evaluate responses using a judge model, resulting in subjective and unreliable evaluation. In addition, we observe a lack of benchmarks for VLMs in the Korean language, which are necessary as a separate metric from more common English language benchmarks, as the performance of generative language models can differ significantly based on the language being used. Therefore, we present KOFFVQA, a general-purpose free-form visual question answering benchmark in the Korean language for the evaluation of VLMs. Our benchmark consists of 275 carefully crafted questions each paired with an image and grading criteria covering 10 different aspects of VLM performance. The grading criteria eliminate the problem of unreliability by allowing the judge model to grade each response based on a pre-determined set of rules. By defining the evaluation criteria in an objective manner, even a small open-source model can be used to evaluate models on our benchmark reliably. In addition to evaluating a large number of existing VLMs on our benchmark, we also experimentally verify that our method of using pre-existing grading criteria for evaluation is much more reliable than existing methods. Our evaluation code is available at https://github.com/maum-ai/KOFFVQA

* Accepted to CVPRW 2025, Workshop on Benchmarking and Expanding AI Multimodal Approaches

Via

Access Paper or Ask Questions

HerO at AVeriTeC: The Herd of Open Large Language Models for Verifying Real-World Claims

Oct 16, 2024

Yejun Yoon, Jaeyoon Jung, Seunghyun Yoon, Kunwoo Park

Abstract:To tackle the AVeriTeC shared task hosted by the FEVER-24, we introduce a system that only employs publicly available large language models (LLMs) for each step of automated fact-checking, dubbed the Herd of Open LLMs for verifying real-world claims (HerO). HerO employs multiple LLMs for each step of automated fact-checking. For evidence retrieval, a language model is used to enhance a query by generating hypothetical fact-checking documents. We prompt pretrained and fine-tuned LLMs for question generation and veracity prediction by crafting prompts with retrieved in-context samples. HerO achieved 2nd place on the leaderboard with the AVeriTeC score of 0.57, suggesting the potential of open LLMs for verifying real-world claims. For future research, we make our code publicly available at https://github.com/ssu-humane/HerO.

* A system description paper for the AVeriTeC shared task, hosted by the seventh FEVER workshop (co-located with EMNLP 2024)

Via

Access Paper or Ask Questions

CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction

Oct 02, 2024

Suhwan Choi, Yongjun Cho, Minchan Kim, Jaeyoon Jung, Myunchul Joe, Yubeen Park, Minseo Kim, Sungwoong Kim, Sungjae Lee, Hwiseong Park(+2 more)

Figure 1 for CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction

Figure 2 for CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction

Figure 3 for CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction

Figure 4 for CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction

Abstract:Real-life robot navigation involves more than just reaching a destination; it requires optimizing movements while addressing scenario-specific goals. An intuitive way for humans to express these goals is through abstract cues like verbal commands or rough sketches. Such human guidance may lack details or be noisy. Nonetheless, we expect robots to navigate as intended. For robots to interpret and execute these abstract instructions in line with human expectations, they must share a common understanding of basic navigation concepts with humans. To this end, we introduce CANVAS, a novel framework that combines visual and linguistic instructions for commonsense-aware navigation. Its success is driven by imitation learning, enabling the robot to learn from human navigation behavior. We present COMMAND, a comprehensive dataset with human-annotated navigation results, spanning over 48 hours and 219 km, designed to train commonsense-aware navigation systems in simulated environments. Our experiments show that CANVAS outperforms the strong rule-based system ROS NavStack across all environments, demonstrating superior performance with noisy instructions. Notably, in the orchard environment, where ROS NavStack records a 0% total success rate, CANVAS achieves a total success rate of 67%. CANVAS also closely aligns with human demonstrations and commonsense constraints, even in unseen environments. Furthermore, real-world deployment of CANVAS showcases impressive Sim2Real transfer with a total success rate of 69%, highlighting the potential of learning from human demonstrations in simulated environments for real-world applications.

* project page https://worv-ai.github.io/canvas

Via

Access Paper or Ask Questions

EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio Captioning Performance

Sep 02, 2024

Jaeyeon Kim, Minjeon Jeon, Jaeyoon Jung, Sang Hoon Woo, Jinjoo Lee

Figure 1 for EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio Captioning Performance

Figure 2 for EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio Captioning Performance

Figure 3 for EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio Captioning Performance

Figure 4 for EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio Captioning Performance

Abstract:In this work, we aim to analyze and optimize the EnCLAP framework, a state-of-the-art model in automated audio captioning. We investigate the impact of modifying the acoustic encoder components, explore pretraining with different dataset scales, and study the effectiveness of a reranking scheme. Through extensive experimentation and quantitative analysis of generated captions, we develop EnCLAP++, an enhanced version that significantly surpasses the original.

* Accepted to DCASE2024 Workshop

Via

Access Paper or Ask Questions

Expanding on EnCLAP with Auxiliary Retrieval Model for Automated Audio Captioning

Sep 02, 2024

Jaeyeon Kim, Jaeyoon Jung, Minjeong Jeon, Sang Hoon Woo, Jinjoo Lee

Abstract:In this technical report, we describe our submission to DCASE2024 Challenge Task6 (Automated Audio Captioning) and Task8 (Language-based Audio Retrieval). We develop our approach building upon the EnCLAP audio captioning framework and optimizing it for Task6 of the challenge. Notably, we outline the changes in the underlying components and the incorporation of the reranking process. Additionally, we submit a supplementary retriever model, a byproduct of our modified framework, to Task8. Our proposed systems achieve FENSE score of 0.542 on Task6 and mAP@10 score of 0.386 on Task8, significantly outperforming the baseline models.

* DCASE2024 Challenge Technical Report. Ranked 2nd in Task 6 Automated Audio Captioning

Via

Access Paper or Ask Questions

EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning

Jan 31, 2024

Jaeyeon Kim, Jaeyoon Jung, Jinjoo Lee, Sang Hoon Woo

Abstract:We propose EnCLAP, a novel framework for automated audio captioning. EnCLAP employs two acoustic representation models, EnCodec and CLAP, along with a pretrained language model, BART. We also introduce a new training objective called masked codec modeling that improves acoustic awareness of the pretrained language model. Experimental results on AudioCaps and Clotho demonstrate that our model surpasses the performance of baseline models. Source code will be available at https://github.com/jaeyeonkim99/EnCLAP . An online demo is available at https://huggingface.co/spaces/enclap-team/enclap .

* Accepted to ICASSP 2024

Via

Access Paper or Ask Questions