Abstract:Recent advances in generative image compression (GIC) have delivered remarkable improvements in perceptual quality. However, many GICs rely on large-scale and rigid models, which severely constrain their utility for flexible transmission and practical deployment in low-bitrate scenarios. To address these issues, we propose Progressive Generative Image Compression (ProGIC), a compact codec built on residual vector quantization (RVQ). In RVQ, a sequence of vector quantizers encodes the residuals stage by stage, each with its own codebook. The resulting codewords sum to a coarse-to-fine reconstruction and a progressive bitstream, enabling previews from partial data. We pair this with a lightweight backbone based on depthwise-separable convolutions and small attention blocks, enabling practical deployment on both GPUs and CPU-only devices. Experimental results show that ProGIC attains comparable compression performance compared with previous methods. It achieves bitrate savings of up to 57.57% on DISTS and 58.83% on LPIPS compared to MS-ILLM on the Kodak dataset. Beyond perceptual quality, ProGIC enables progressive transmission for flexibility, and also delivers over 10 times faster encoding and decoding compared with MS-ILLM on GPUs for efficiency.
Abstract:Accurately identifying lychee-picking points in unstructured orchard environments and obtaining their coordinate locations is critical to the success of lychee-picking robots. However, traditional two-dimensional (2D) image-based object detection methods often struggle due to the complex geometric structures of branches, leaves and fruits, leading to incorrect determination of lychee picking points. In this study, we propose a Fcaf3d-lychee network model specifically designed for the accurate localisation of lychee picking points. Point cloud data of lychee picking points in natural environments are acquired using Microsoft's Azure Kinect DK time-of-flight (TOF) camera through multi-view stitching. We augment the Fully Convolutional Anchor-Free 3D Object Detection (Fcaf3d) model with a squeeze-and-excitation(SE) module, which exploits human visual attention mechanisms for improved feature extraction of lychee picking points. The trained network model is evaluated on a test set of lychee-picking locations and achieves an impressive F1 score of 88.57%, significantly outperforming existing models. Subsequent three-dimensional (3D) position detection of picking points in real lychee orchard environments yields high accuracy, even under varying degrees of occlusion. Localisation errors of lychee picking points are within 1.5 cm in all directions, demonstrating the robustness and generality of the model.



Abstract:In this work, we propose a novel framework for achieving robotic autonomy in orchards. It consists of two key steps: perception and semantic mapping. In the perception step, we introduce a 3D detection method that accurately identifies objects directly on point cloud maps. In the semantic mapping step, we develop a mapping module that constructs a visibility graph map by incorporating object-level information and terrain analysis. By combining these two steps, our framework improves the autonomy of agricultural robots in orchard environments. The accurate detection of objects and the construction of a semantic map enable the robot to navigate autonomously, perform tasks such as fruit harvesting, and acquire actionable information for efficient agricultural production.




Abstract:Terahertz (THz) communication and the application of massive multiple-input multiple-output (MIMO) technology have been proved significant for the sixth generation (6G) communication systems, and have gained global interests. In this paper, we employ the shooting and bouncing ray (SBR) method integrated with acceleration technology to model THz and massive MIMO channel. The results of ray tracing (RT) simulation in this paper, i.e., angle of departure (AoD), angle of arrival (AoA), and power delay profile (PDP) under the frequency band supported by the commercial RT software Wireless Insite (WI) are in agreement with those produced by WI. Based on the Kirchhoff scattering effect on material surfaces and atmospheric absorption loss showing at THz frequency band, the modified propagation models of Fresnel reflection coefficients and free-space attenuation are consistent with the measured results. For massive MIMO, the channel capacity and the stochastic power distribution are analyzed. The results indicate the applicability of SBR method for building deterministic models of THz and massive MIMO channels with extensive functions and acceptable accuracy.