Abstract:Feature upsampling is an essential operation in constructing deep convolutional neural networks. However, existing upsamplers either lack specific feature guidance or necessitate the utilization of high-resolution feature maps, resulting in a loss of performance and flexibility. In this paper, we find that the local self-attention naturally has the feature guidance capability, and its computational paradigm aligns closely with the essence of feature upsampling (\ie feature reassembly of neighboring points). Therefore, we introduce local self-attention into the upsampling task and demonstrate that the majority of existing upsamplers can be regarded as special cases of upsamplers based on local self-attention. Considering the potential semantic gap between upsampled points and their neighboring points, we further introduce the deformation mechanism into the upsampler based on local self-attention, thereby proposing LDA-AQU. As a novel dynamic kernel-based upsampler, LDA-AQU utilizes the feature of queries to guide the model in adaptively adjusting the position and aggregation weight of neighboring points, thereby meeting the upsampling requirements across various complex scenarios. In addition, LDA-AQU is lightweight and can be easily integrated into various model architectures. We evaluate the effectiveness of LDA-AQU across four dense prediction tasks: object detection, instance segmentation, panoptic segmentation, and semantic segmentation. LDA-AQU consistently outperforms previous state-of-the-art upsamplers, achieving performance enhancements of 1.7 AP, 1.5 AP, 2.0 PQ, and 2.5 mIoU compared to the baseline models in the aforementioned four tasks, respectively. Code is available at \url{https://github.com/duzw9311/LDA-AQU}.
Abstract:Object detection in aerial images has always been a challenging task due to the generally small size of the objects. Most current detectors prioritize novel detection frameworks, often overlooking research on fundamental components such as feature pyramid networks. In this paper, we introduce the Cross-Layer Feature Pyramid Transformer (CFPT), a novel upsampler-free feature pyramid network designed specifically for small object detection in aerial images. CFPT incorporates two meticulously designed attention blocks with linear computational complexity: the Cross-Layer Channel-Wise Attention (CCA) and the Cross-Layer Spatial-Wise Attention (CSA). CCA achieves cross-layer interaction by dividing channel-wise token groups to perceive cross-layer global information along the spatial dimension, while CSA completes cross-layer interaction by dividing spatial-wise token groups to perceive cross-layer global information along the channel dimension. By integrating these modules, CFPT enables cross-layer interaction in one step, thereby avoiding the semantic gap and information loss associated with element-wise summation and layer-by-layer transmission. Furthermore, CFPT incorporates global contextual information, which enhances detection performance for small objects. To further enhance location awareness during cross-layer interaction, we propose the Cross-Layer Consistent Relative Positional Encoding (CCPE) based on inter-layer mutual receptive fields. We evaluate the effectiveness of CFPT on two challenging object detection datasets in aerial images, namely VisDrone2019-DET and TinyPerson. Extensive experiments demonstrate the effectiveness of CFPT, which outperforms state-of-the-art feature pyramid networks while incurring lower computational costs. The code will be released at https://github.com/duzw9311/CFPT.
Abstract:In this paper, we introduce a new outlier removal method that fully leverages geometric and semantic information, to achieve robust registration. Current semantic-based registration methods only use semantics for point-to-point or instance semantic correspondence generation, which has two problems. First, these methods are highly dependent on the correctness of semantics. They perform poorly in scenarios with incorrect semantics and sparse semantics. Second, the use of semantics is limited only to the correspondence generation, resulting in bad performance in the weak geometry scene. To solve these problems, on the one hand, we propose secondary ground segmentation and loose semantic consistency based on regional voting. It improves the robustness to semantic correctness by reducing the dependence on single-point semantics. On the other hand, we propose semantic-geometric consistency for outlier removal, which makes full use of semantic information and significantly improves the quality of correspondences. In addition, a two-stage hypothesis verification is proposed, which solves the problem of incorrect transformation selection in the weak geometry scene. In the outdoor dataset, our method demonstrates superior performance, boosting a 22.5 percentage points improvement in registration recall and achieving better robustness under various conditions. Our code is available.
Abstract:Addressing the challenges posed by the substantial gap in point cloud data collected from diverse sensors, achieving robust cross-source point cloud registration becomes a formidable task. In response, we present a novel framework for point cloud registration with broad applicability, suitable for both homologous and cross-source registration scenarios. To tackle the issues arising from different densities and distributions in cross-source point cloud data, we introduce a feature representation based on spherical voxels. Furthermore, addressing the challenge of numerous outliers and mismatches in cross-source registration, we propose a hierarchical correspondence filtering approach. This method progressively filters out mismatches, yielding a set of high-quality correspondences. Our method exhibits versatile applicability and excels in both traditional homologous registration and challenging cross-source registration scenarios. Specifically, in homologous registration using the 3DMatch dataset, we achieve the highest registration recall of 95.1% and an inlier ratio of 87.8%. In cross-source point cloud registration, our method attains the best RR on the 3DCSR dataset, demonstrating a 9.3 percentage points improvement. The code is available at https://github.com/GuiyuZhao/VRHCF.
Abstract:Detection of small, undetermined moving objects or objects in an occluded environment with a cluttered background is the main problem of computer vision. This greatly affects the detection accuracy of deep learning models. To overcome these problems, we concentrate on deep learning models for real-time detection of cars and tanks in an occluded environment with a cluttered background employing SSD and YOLO algorithms and improved precision of detection and reduce problems faced by these models. The developed method makes the custom dataset and employs a preprocessing technique to clean the noisy dataset. For training the developed model we apply the data augmentation technique to balance and diversify the data. We fine-tuned, trained, and evaluated these models on the established dataset by applying these techniques and highlighting the results we got more accurately than without applying these techniques. The accuracy and frame per second of the SSD-Mobilenet v2 model are higher than YOLO V3 and YOLO V4. Furthermore, by employing various techniques like data enhancement, noise reduction, parameter optimization, and model fusion we improve the effectiveness of detection and recognition. We further added a counting algorithm, and target attributes experimental comparison, and made a graphical user interface system for the developed model with features of object counting, alerts, status, resolution, and frame per second. Subsequently, to justify the importance of the developed method analysis of YOLO V3, V4, and SSD were incorporated. Which resulted in the overall completion of the proposed method.
Abstract:Point cloud registration is to estimate a transformation to align point clouds collected in different perspectives. In learning-based point cloud registration, a robust descriptor is vital for high-accuracy registration. However, most methods are susceptible to noise and have poor generalization ability on unseen datasets. Motivated by this, we introduce SphereNet to learn a noise-robust and unseen-general descriptor for point cloud registration. In our method, first, the spheroid generator builds a geometric domain based on spherical voxelization to encode initial features. Then, the spherical interpolation of the sphere is introduced to realize robustness against noise. Finally, a new spherical convolutional neural network with spherical integrity padding completes the extraction of descriptors, which reduces the loss of features and fully captures the geometric features. To evaluate our methods, a new benchmark 3DMatch-noise with strong noise is introduced. Extensive experiments are carried out on both indoor and outdoor datasets. Under high-intensity noise, SphereNet increases the feature matching recall by more than 25 percentage points on 3DMatch-noise. In addition, it sets a new state-of-the-art performance for the 3DMatch and 3DLoMatch benchmarks with 93.5\% and 75.6\% registration recall and also has the best generalization ability on unseen datasets.
Abstract:LoRa lacks the sensing capabilities of channel status. Received signal strength indicator (RSSI) decreases due to collision, interference, and near-far effect while for signal-to-noise ratio (SNR), the packets are rejected by decreasing the transmission power (TP) at a higher spreading factor (SF). To overcome these challenges in the case of electric shelf label (ESL) to minimize the dependency on retransmission and acknowledgment, the end devices (EDs) are allocated around gateways (GWs) based on machine clustering with dynamic SF for SNR while dynamic TP for RSSI. The experimental results determined that the RSSI approach is more dominant than SNR because of determining the exact locality of the ED that diminished the capture effect. Arithmetic distribution of EDs for various GWs in different clusters helps to minify the near-far effect. The resultant received power (RP) at each cluster is higher for most of the connected EDs than the threshold RP.
Abstract:In agriculture, crops need to apply pesticide spraying flow control precisely to reduce costs, protect the environment, and increase yield production. Although there have several variable control methods for spraying flow control because indirect control flow techniques and having a slow response could cause inaccuracy and mismanagement, also noted that those systems also suffer from complicated design and debugging, etc. In this paper, an embedded design of the fuzzy PID variable spraying control method is adopted. The experimental results show that the overshoot of Proportional Integral Derivative (PID) control is 10.76%, and the overshoot of fuzzy PID control is 7.17% which can meet the requirements of an advanced spray control flow system. For further investigation, a novel spray flow control method based on Programmable Logic Control (PLC) is proposed in this paper.
Abstract:To address the problem of imperfect confrontation strategy caused by the lack of information of game environment in the simulation of non-complete information dynamic countermeasure modeling for intelligent game, the hierarchical analysis game strategy of confrontation model based on OODA ring (Observation, Orientation, Decision, Action) theory is proposed. At the same time, taking into account the trend of unmanned future warfare, NetLogo software simulation is used to construct a dynamic derivation of the confrontation between two tanks. In the validation process, the OODA loop theory is used to describe the operation process of the complex system between red and blue sides, and the four-step cycle of observation, judgment, decision and execution is carried out according to the number of armor of both sides, and then the OODA loop system adjusts the judgment and decision time coefficients for the next confrontation cycle according to the results of the first cycle. Compared with traditional simulation methods that consider objective factors such as loss rate and support rate, the OODA-loop-based hierarchical game analysis can analyze the confrontation situation more comprehensively.
Abstract:Since the outbreak of the COVID-19 in December 2019, medical protective equipment such as disposable medical masks and KN95 masks have become essential resources for the public. Enterprises in all sectors of society have also transformed the production of medical masks. After the outbreak, how to choose the right time to produce medical protective masks, and what type of medical masks to produce will play a positive role in preventing and controlling the epidemic in a short time. In this regard, the evolutionary game competition analysis will be conducted through the relevant data of disposable medical masks and KN95 masks to determine the appropriate nodes for the production of corresponding mask types. After the research and analysis of the production strategy of mask types, it has a positive effect on how to guide the resumption of work and production.