Abstract:The Three-River-Source region is a highly significant natural reserve in China that harbors a plethora of untamed botanical resources. To meet the practical requirements of botanical research and intelligent plant management, we construct a large-scale dataset for Plant detection in the Three-River-Source region (PTRS). This dataset comprises 6965 high-resolution images of 2160*3840 pixels, captured by diverse sensors and platforms, and featuring objects of varying shapes and sizes. Subsequently, a team of botanical image interpretation experts annotated these images with 21 commonly occurring object categories. The fully annotated PTRS images contain 122, 300 instances of plant leaves, each labeled by a horizontal rectangle. The PTRS presents us with challenges such as dense occlusion, varying leaf resolutions, and high feature similarity among plants, prompting us to develop a novel object detection network named PlantDet. This network employs a window-based efficient self-attention module (ST block) to generate robust feature representation at multiple scales, improving the detection efficiency for small and densely-occluded objects. Our experimental results validate the efficacy of our proposed plant detection benchmark, with a precision of 88.1%, a mean average precision (mAP) of 77.6%, and a higher recall compared to the baseline. Additionally, our method effectively overcomes the issue of missing small objects. We intend to share our data and code with interested parties to advance further research in this field.