Hong Kong University of Science and Technology
Abstract:Underwater litter is widely spread across aquatic environments such as lakes, rivers, and oceans, significantly impacting natural ecosystems. Current monitoring technologies for detecting underwater litter face limitations in survey efficiency, cost, and environmental conditions, highlighting the need for efficient, consumer-grade technologies for automatic detection. This research introduces the Aerial-Aquatic Speedy Scanner (AASS) combined with Super-Resolution Reconstruction (SRR) and an improved YOLOv8 detection network. AASS enhances data acquisition efficiency over traditional methods, capturing high-quality images that accurately identify underwater waste. SRR improves image-resolution by mitigating motion blur and insufficient resolution, thereby enhancing detection tasks. Specifically, the RCAN model achieved the highest mean average precision (mAP) of 78.6% for detection accuracy on reconstructed images among the tested SRR models. With a magnification factor of 4, the SRR test set shows an improved mAP compared to the conventional bicubic set. These results demonstrate the effectiveness of the proposed method in detecting underwater litter.
Abstract:Hermit crabs play a crucial role in coastal ecosystems by dispersing seeds, cleaning up debris, and disturbing soil. They serve as vital indicators of marine environmental health, responding to climate change and pollution. Traditional survey methods, like quadrat sampling, are labor-intensive, time-consuming, and environmentally dependent. This study presents an innovative approach combining UAV-based remote sensing with Super-Resolution Reconstruction (SRR) and the CRAB-YOLO detection network, a modification of YOLOv8s, to monitor hermit crabs. SRR enhances image quality by addressing issues such as motion blur and insufficient resolution, significantly improving detection accuracy over conventional low-resolution fuzzy images. The CRAB-YOLO network integrates three improvements for detection accuracy, hermit crab characteristics, and computational efficiency, achieving state-of-the-art (SOTA) performance compared to other mainstream detection models. The RDN networks demonstrated the best image reconstruction performance, and CRAB-YOLO achieved a mean average precision (mAP) of 69.5% on the SRR test set, a 40% improvement over the conventional Bicubic method with a magnification factor of 4. These results indicate that the proposed method is effective in detecting hermit crabs, offering a cost-effective and automated solution for extensive hermit crab monitoring, thereby aiding coastal benthos conservation.
Abstract:In the field of computer vision, visible light images often exhibit low contrast in low-light conditions, presenting a significant challenge. While infrared imagery provides a potential solution, its utilization entails high costs and practical limitations. Recent advancements in deep learning, particularly the deployment of Generative Adversarial Networks (GANs), have facilitated the transformation of visible light images to infrared images. However, these methods often experience unstable training phases and may produce suboptimal outputs. To address these issues, we propose a novel end-to-end Transformer-based model that efficiently converts visible light images into high-fidelity infrared images. Initially, the Texture Mapping Module and Color Perception Adapter collaborate to extract texture and color features from the visible light image. The Dynamic Fusion Aggregation Module subsequently integrates these features. Finally, the transformation into an infrared image is refined through the synergistic action of the Color Perception Adapter and the Enhanced Perception Attention mechanism. Comprehensive benchmarking experiments confirm that our model outperforms existing methods, producing infrared images of markedly superior quality, both qualitatively and quantitatively. Furthermore, the proposed model enables more effective downstream applications for infrared images than other methods.
Abstract:The optimization of a wavelet-based algorithm to improve speech intelligibility is reported. The discrete-time speech signal is split into frequency sub-bands via a multi-level discrete wavelet transform. Various gains are applied to the sub-band signals before they are recombined to form a modified version of the speech. The sub-band gains are adjusted while keeping the overall signal energy unchanged, and the speech intelligibility under various background interference and simulated hearing loss conditions is enhanced and evaluated objectively and quantitatively using Google Speech-to-Text transcription. For English and Chinese noise-free speech, overall intelligibility is improved, and the transcription accuracy can be increased by as much as 80 percentage points by reallocating the spectral energy toward the mid-frequency sub-bands, effectively increasing the consonant-vowel intensity ratio. This is reasonable since the consonants are relatively weak and of short duration, which are therefore the most likely to become indistinguishable in the presence of background noise or high-frequency hearing impairment. For speech already corrupted by noise, improving intelligibility is challenging but still realizable. The proposed algorithm is implementable for real-time signal processing and comparatively simpler than previous algorithms. Potential applications include speech enhancement, hearing aids, machine listening, and a better understanding of speech intelligibility.
Abstract:The classical view of epistemic logic is that an agent knows all the logical consequences of their knowledge base. This assumption of logical omniscience is often unrealistic and makes reasoning computationally intractable. One approach to avoid logical omniscience is to limit reasoning to a certain belief level, which intuitively measures the reasoning "depth." This paper investigates the computational complexity of reasoning with belief levels. First we show that while reasoning remains tractable if the level is constant, the complexity jumps to PSPACE-complete -- that is, beyond classical reasoning -- when the belief level is part of the input. Then we further refine the picture using parameterized complexity theory to investigate how the belief level and the number of non-logical symbols affect the complexity.