Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuan Xia

Can Large Language Models Learn Formal Logic? A Data-Driven Training and Evaluation Framework

Apr 28, 2025

Yuan Xia, Akanksha Atrey, Fadoua Khmaissia, Kedar S. Namjoshi

Figure 1 for Can Large Language Models Learn Formal Logic? A Data-Driven Training and Evaluation Framework

Figure 2 for Can Large Language Models Learn Formal Logic? A Data-Driven Training and Evaluation Framework

Figure 3 for Can Large Language Models Learn Formal Logic? A Data-Driven Training and Evaluation Framework

Figure 4 for Can Large Language Models Learn Formal Logic? A Data-Driven Training and Evaluation Framework

Abstract:This paper investigates the logical reasoning capabilities of large language models (LLMs). For a precisely defined yet tractable formulation, we choose the conceptually simple but technically complex task of constructing proofs in Boolean logic. A trained LLM receives as input a set of assumptions and a goal, and produces as output a proof that formally derives the goal from the assumptions. Incorrect proofs are caught by an automated proof checker. A critical obstacle for training is the scarcity of real-world proofs. We propose an efficient, randomized procedure for synthesizing valid proofs and introduce Template Transformation, a data augmentation technique that enhances the model's ability to handle complex logical expressions. The central evaluation question is whether an LLM has indeed learned to reason. We propose tests to measure the reasoning ability of a black-box LLM. By these measures, experiments demonstrate strong reasoning capabilities for assertions with short proofs, which decline with proof complexity. Notably, template transformation improves accuracy even for smaller models, suggesting its effectiveness across model scales.

Via

Access Paper or Ask Questions

ImageScope: Unifying Language-Guided Image Retrieval via Large Multimodal Model Collective Reasoning

Mar 13, 2025

Pengfei Luo, Jingbo Zhou, Tong Xu, Yuan Xia, Linli Xu, Enhong Chen

Figure 1 for ImageScope: Unifying Language-Guided Image Retrieval via Large Multimodal Model Collective Reasoning

Figure 2 for ImageScope: Unifying Language-Guided Image Retrieval via Large Multimodal Model Collective Reasoning

Figure 3 for ImageScope: Unifying Language-Guided Image Retrieval via Large Multimodal Model Collective Reasoning

Figure 4 for ImageScope: Unifying Language-Guided Image Retrieval via Large Multimodal Model Collective Reasoning

Abstract:With the proliferation of images in online content, language-guided image retrieval (LGIR) has emerged as a research hotspot over the past decade, encompassing a variety of subtasks with diverse input forms. While the development of large multimodal models (LMMs) has significantly facilitated these tasks, existing approaches often address them in isolation, requiring the construction of separate systems for each task. This not only increases system complexity and maintenance costs, but also exacerbates challenges stemming from language ambiguity and complex image content, making it difficult for retrieval systems to provide accurate and reliable results. To this end, we propose ImageScope, a training-free, three-stage framework that leverages collective reasoning to unify LGIR tasks. The key insight behind the unification lies in the compositional nature of language, which transforms diverse LGIR tasks into a generalized text-to-image retrieval process, along with the reasoning of LMMs serving as a universal verification to refine the results. To be specific, in the first stage, we improve the robustness of the framework by synthesizing search intents across varying levels of semantic granularity using chain-of-thought (CoT) reasoning. In the second and third stages, we then reflect on retrieval results by verifying predicate propositions locally, and performing pairwise evaluations globally. Experiments conducted on six LGIR datasets demonstrate that ImageScope outperforms competitive baselines. Comprehensive evaluations and ablation studies further confirm the effectiveness of our design.

* WWW 2025

Via

Access Paper or Ask Questions

Improving Retrieval Augmented Language Model with Self-Reasoning

Jul 29, 2024

Yuan Xia, Jingbo Zhou, Zhenhui Shi, Jun Chen, Haifeng Huang

Abstract:The Retrieval-Augmented Language Model (RALM) has shown remarkable performance on knowledge-intensive tasks by incorporating external knowledge during inference, which mitigates the factual hallucinations inherited in large language models (LLMs). Despite these advancements, challenges persist in the implementation of RALMs, particularly concerning their reliability and traceability. To be specific, the irrelevant document retrieval may result in unhelpful response generation or even deteriorate the performance of LLMs, while the lack of proper citations in generated outputs complicates efforts to verify the trustworthiness of the models. To this end, we propose a novel self-reasoning framework aimed at improving the reliability and traceability of RALMs, whose core idea is to leverage reasoning trajectories generated by the LLM itself. The framework involves constructing self-reason trajectories with three processes: a relevance-aware process, an evidence-aware selective process, and a trajectory analysis process. We have evaluated our framework across four public datasets (two short-form QA datasets, one long-form QA dataset, and one fact verification dataset) to demonstrate the superiority of our method, which can outperform existing state-of-art models and can achieve comparable performance with GPT-4, while only using 2,000 training samples.

Via

Access Paper or Ask Questions

LSMVOS: Long-Short-Term Similarity Matching for Video Object

Sep 02, 2020

Zhang Xuerui, Yuan Xia

Figure 1 for LSMVOS: Long-Short-Term Similarity Matching for Video Object

Figure 2 for LSMVOS: Long-Short-Term Similarity Matching for Video Object

Figure 3 for LSMVOS: Long-Short-Term Similarity Matching for Video Object

Figure 4 for LSMVOS: Long-Short-Term Similarity Matching for Video Object

Abstract:Objective Semi-supervised video object segmentation refers to segmenting the object in subsequent frames given the object label in the first frame. Existing algorithms are mostly based on the objectives of matching and propagation strategies, which often make use of the previous frame with masking or optical flow. This paper explores a new propagation method, uses short-term matching modules to extract the information of the previous frame and apply it in propagation, and proposes the network of Long-Short-Term similarity matching for video object segmentation (LSMOVS) Method: By conducting pixel-level matching and correlation between long-term matching module and short-term matching module with the first frame and previous frame, global similarity map and local similarity map are obtained, as well as feature pattern of current frame and masking of previous frame. After two refine networks, final results are obtained through segmentation network. Results: According to the experiments on the two data sets DAVIS 2016 and 2017, the method of this paper achieves favorable average of region similarity and contour accuracy without online fine tuning, which achieves 86.5% and 77.4% in terms of single target and multiple targets. Besides, the count of segmented frames per second reached 21. Conclusion: The short-term matching module proposed in this paper is more conducive to extracting the information of the previous frame than only the mask. By combining the long-term matching module with the short-term matching module, the whole network can achieve efficient video object segmentation without online fine tuning

Via

Access Paper or Ask Questions