Abstract:Monocular absolute 3D fish pose estimation allows for efficient fish length measurement in the longline fisheries, where fishes are under severe deformation during the catching process. This task is challenging since it requires locating absolute 3D fish keypoints based on a short monocular video clip. Unlike related works, which either require expensive 3D ground-truth data and/or multiple-view images to provide depth information, or are limited to rigid objects, we propose a novel frame-based method to estimate the absolute 3D fish pose and fish length from a single-view 2D segmentation mask. We first introduce a relative 3D fish template. By minimizing an objective function, our method systematically estimates the relative 3D pose of the target fish and fish 2D keypoints in the image. Finally, with a closed-form solution, the relative 3D fish pose can help locate absolute 3D keypoints, resulting in the frame-based absolute fish length measurement, which is further refined based on the statistical temporal inference for the optimal fish length measurement from the video clip. Our experiments show that this method can accurately estimate the absolute 3D fish pose and further measure the absolute length, even outperforming the state-of-the-art multi-view method.
Abstract:The goal of electronic monitoring (EM) of longline fishing is to monitor the fish catching activities on fishing vessels, either for the regulatory compliance or catch counting. Hierarchical classification based on videos allows for inexpensive and efficient fish species identification of catches from longline fishing, where fishes are under severe deformation and self-occlusion during the catching process. More importantly, the flexibility of hierarchical classification mitigates the laborious efforts of human reviews by providing confidence scores in different hierarchical levels. Some related works either use cascaded models for hierarchical classification or make predictions per image or predict one overlapping hierarchical data structure of the dataset in advance. However, with a known non-overlapping hierarchical data structure provided by fisheries scientists, our method enforces the hierarchical data structure and introduces an efficient training and inference strategy for video-based fisheries data. Our experiments show that the proposed method outperforms the classic flat classification system significantly and our ablation study justifies our contributions in CNN model design, training strategy, and the video-based inference schemes for the hierarchical fish species classification task.