Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yijia Chen

Hong Kong University of Science and Technology

On Path to Multimodal Historical Reasoning: HistBench and HistAgent

May 26, 2025

Jiahao Qiu, Fulian Xiao, Yimin Wang, Yuchen Mao, Yijia Chen, Xinzhe Juan, Siran Wang, Xuan Qi, Tongcheng Zhang, Zixin Yao(+88 more)

Abstract:Recent advances in large language models (LLMs) have led to remarkable progress across domains, yet their capabilities in the humanities, particularly history, remain underexplored. Historical reasoning poses unique challenges for AI, involving multimodal source interpretation, temporal inference, and cross-linguistic analysis. While general-purpose agents perform well on many existing benchmarks, they lack the domain-specific expertise required to engage with historical materials and questions. To address this gap, we introduce HistBench, a new benchmark of 414 high-quality questions designed to evaluate AI's capacity for historical reasoning and authored by more than 40 expert contributors. The tasks span a wide range of historical problems-from factual retrieval based on primary sources to interpretive analysis of manuscripts and images, to interdisciplinary challenges involving archaeology, linguistics, or cultural history. Furthermore, the benchmark dataset spans 29 ancient and modern languages and covers a wide range of historical periods and world regions. Finding the poor performance of LLMs and other agents on HistBench, we further present HistAgent, a history-specific agent equipped with carefully designed tools for OCR, translation, archival search, and image understanding in History. On HistBench, HistAgent based on GPT-4o achieves an accuracy of 27.54% pass@1 and 36.47% pass@2, significantly outperforming LLMs with online search and generalist agents, including GPT-4o (18.60%), DeepSeek-R1(14.49%) and Open Deep Research-smolagents(20.29% pass@1 and 25.12% pass@2). These results highlight the limitations of existing LLMs and generalist agents and demonstrate the advantages of HistAgent for historical reasoning.

* 17 pages, 7 figures

Via

Access Paper or Ask Questions

Monitoring of Hermit Crabs Using drone-captured imagery and Deep Learning based Super-Resolution Reconstruction and Improved YOLOv8

Aug 07, 2024

Fan Zhao, Yijia Chen, Dianhan Xi, Yongying Liu, Jiaqi Wang, Shigeru Tabeta, Katsunori Mizuno

Figure 1 for Monitoring of Hermit Crabs Using drone-captured imagery and Deep Learning based Super-Resolution Reconstruction and Improved YOLOv8

Figure 2 for Monitoring of Hermit Crabs Using drone-captured imagery and Deep Learning based Super-Resolution Reconstruction and Improved YOLOv8

Figure 3 for Monitoring of Hermit Crabs Using drone-captured imagery and Deep Learning based Super-Resolution Reconstruction and Improved YOLOv8

Figure 4 for Monitoring of Hermit Crabs Using drone-captured imagery and Deep Learning based Super-Resolution Reconstruction and Improved YOLOv8

Abstract:Hermit crabs play a crucial role in coastal ecosystems by dispersing seeds, cleaning up debris, and disturbing soil. They serve as vital indicators of marine environmental health, responding to climate change and pollution. Traditional survey methods, like quadrat sampling, are labor-intensive, time-consuming, and environmentally dependent. This study presents an innovative approach combining UAV-based remote sensing with Super-Resolution Reconstruction (SRR) and the CRAB-YOLO detection network, a modification of YOLOv8s, to monitor hermit crabs. SRR enhances image quality by addressing issues such as motion blur and insufficient resolution, significantly improving detection accuracy over conventional low-resolution fuzzy images. The CRAB-YOLO network integrates three improvements for detection accuracy, hermit crab characteristics, and computational efficiency, achieving state-of-the-art (SOTA) performance compared to other mainstream detection models. The RDN networks demonstrated the best image reconstruction performance, and CRAB-YOLO achieved a mean average precision (mAP) of 69.5% on the SRR test set, a 40% improvement over the conventional Bicubic method with a magnification factor of 4. These results indicate that the proposed method is effective in detecting hermit crabs, offering a cost-effective and automated solution for extensive hermit crab monitoring, thereby aiding coastal benthos conservation.

* The earlier version of this conference paper was presented at OCEANS 2024-Singapore and was selected for inclusion in the Student Poster Competition (SPC) Program

Via

Access Paper or Ask Questions

Underwater litter monitoring using consumer-grade aerial-aquatic speedy scanner (AASS) and deep learning based super-resolution reconstruction and detection network

Aug 07, 2024

Fan Zhao, Yongying Liu, Jiaqi Wang, Yijia Chen, Dianhan Xi, Xinlei Shao, Shigeru Tabeta, Katsunori Mizuno

Abstract:Underwater litter is widely spread across aquatic environments such as lakes, rivers, and oceans, significantly impacting natural ecosystems. Current monitoring technologies for detecting underwater litter face limitations in survey efficiency, cost, and environmental conditions, highlighting the need for efficient, consumer-grade technologies for automatic detection. This research introduces the Aerial-Aquatic Speedy Scanner (AASS) combined with Super-Resolution Reconstruction (SRR) and an improved YOLOv8 detection network. AASS enhances data acquisition efficiency over traditional methods, capturing high-quality images that accurately identify underwater waste. SRR improves image-resolution by mitigating motion blur and insufficient resolution, thereby enhancing detection tasks. Specifically, the RCAN model achieved the highest mean average precision (mAP) of 78.6% for detection accuracy on reconstructed images among the tested SRR models. With a magnification factor of 4, the SRR test set shows an improved mAP compared to the conventional bicubic set. These results demonstrate the effectiveness of the proposed method in detecting underwater litter.

* The earlier version of this conference paper was accepted at OCEANS 2024-Halifax, Canada and was selected for inclusion in the Student Poster Competition (SPC) Program

Via

Access Paper or Ask Questions

Implicit Multi-Spectral Transformer: An Lightweight and Effective Visible to Infrared Image Translation Model

Apr 10, 2024

Yijia Chen, Pinghua Chen, Xiangxin Zhou, Yingtie Lei, Ziyang Zhou, Mingxian Li

Abstract:In the field of computer vision, visible light images often exhibit low contrast in low-light conditions, presenting a significant challenge. While infrared imagery provides a potential solution, its utilization entails high costs and practical limitations. Recent advancements in deep learning, particularly the deployment of Generative Adversarial Networks (GANs), have facilitated the transformation of visible light images to infrared images. However, these methods often experience unstable training phases and may produce suboptimal outputs. To address these issues, we propose a novel end-to-end Transformer-based model that efficiently converts visible light images into high-fidelity infrared images. Initially, the Texture Mapping Module and Color Perception Adapter collaborate to extract texture and color features from the visible light image. The Dynamic Fusion Aggregation Module subsequently integrates these features. Finally, the transformation into an infrared image is refined through the synergistic action of the Color Perception Adapter and the Enhanced Perception Attention mechanism. Comprehensive benchmarking experiments confirm that our model outperforms existing methods, producing infrared images of markedly superior quality, both qualitatively and quantitatively. Furthermore, the proposed model enables more effective downstream applications for infrared images than other methods.

* Accepted by IJCNN 2024

Via

Access Paper or Ask Questions

Optimization of a Real-Time Wavelet-Based Algorithm for Improving Speech Intelligibility

Feb 05, 2022

Tianqu Kang, Anh-Dung Dinh, Binghong Wang, Tianyuan Du, Yijia Chen, Kevin Chau

Figure 1 for Optimization of a Real-Time Wavelet-Based Algorithm for Improving Speech Intelligibility

Figure 2 for Optimization of a Real-Time Wavelet-Based Algorithm for Improving Speech Intelligibility

Figure 3 for Optimization of a Real-Time Wavelet-Based Algorithm for Improving Speech Intelligibility

Figure 4 for Optimization of a Real-Time Wavelet-Based Algorithm for Improving Speech Intelligibility

Abstract:The optimization of a wavelet-based algorithm to improve speech intelligibility is reported. The discrete-time speech signal is split into frequency sub-bands via a multi-level discrete wavelet transform. Various gains are applied to the sub-band signals before they are recombined to form a modified version of the speech. The sub-band gains are adjusted while keeping the overall signal energy unchanged, and the speech intelligibility under various background interference and simulated hearing loss conditions is enhanced and evaluated objectively and quantitatively using Google Speech-to-Text transcription. For English and Chinese noise-free speech, overall intelligibility is improved, and the transcription accuracy can be increased by as much as 80 percentage points by reallocating the spectral energy toward the mid-frequency sub-bands, effectively increasing the consonant-vowel intensity ratio. This is reasonable since the consonants are relatively weak and of short duration, which are therefore the most likely to become indistinguishable in the presence of background noise or high-frequency hearing impairment. For speech already corrupted by noise, improving intelligibility is challenging but still realizable. The proposed algorithm is implementable for real-time signal processing and comparatively simpler than previous algorithms. Potential applications include speech enhancement, hearing aids, machine listening, and a better understanding of speech intelligibility.

* 12 pages, 7 figures, 4 tables

Via

Access Paper or Ask Questions

The Complexity of Limited Belief Reasoning -- The Quantifier-Free Case

May 08, 2018

Yijia Chen, Abdallah Saffidine, Christoph Schwering

Figure 1 for The Complexity of Limited Belief Reasoning -- The Quantifier-Free Case

Figure 2 for The Complexity of Limited Belief Reasoning -- The Quantifier-Free Case

Abstract:The classical view of epistemic logic is that an agent knows all the logical consequences of their knowledge base. This assumption of logical omniscience is often unrealistic and makes reasoning computationally intractable. One approach to avoid logical omniscience is to limit reasoning to a certain belief level, which intuitively measures the reasoning "depth." This paper investigates the computational complexity of reasoning with belief levels. First we show that while reasoning remains tractable if the level is constant, the complexity jumps to PSPACE-complete -- that is, beyond classical reasoning -- when the belief level is part of the input. Then we further refine the picture using parameterized complexity theory to investigate how the belief level and the number of non-logical symbols affect the complexity.

* 15 pages, 1 figure, 1 table, Twenty-seventh International Joint Conference on Artificial Intelligence (IJCAI-18)

Via

Access Paper or Ask Questions