Abstract:Long-term Action Quality Assessment (AQA) evaluates the execution of activities in videos. However, the length presents challenges in fine-grained interpretability, with current AQA methods typically producing a single score by averaging clip features, lacking detailed semantic meanings of individual clips. Long-term videos pose additional difficulty due to the complexity and diversity of actions, exacerbating interpretability challenges. While query-based transformer networks offer promising long-term modeling capabilities, their interpretability in AQA remains unsatisfactory due to a phenomenon we term Temporal Skipping, where the model skips self-attention layers to prevent output degradation. To address this, we propose an attention loss function and a query initialization method to enhance performance and interpretability. Additionally, we introduce a weight-score regression module designed to approximate the scoring patterns observed in human judgments and replace conventional single-score regression, improving the rationality of interpretability. Our approach achieves state-of-the-art results on three real-world, long-term AQA benchmarks. Our code is available at: https://github.com/dx199771/Interpretability-AQA
Abstract:Radars and cameras are mature, cost-effective, and robust sensors and have been widely used in the perception stack of mass-produced autonomous driving systems. Due to their complementary properties, outputs from radar detection (radar pins) and camera perception (2D bounding boxes) are usually fused to generate the best perception results. The key to successful radar-camera fusion is accurate data association. The challenges in radar-camera association can be attributed to the complexity of driving scenes, the noisy and sparse nature of radar measurements, and the depth ambiguity from 2D bounding boxes. Traditional rule-based association methods are susceptible to performance degradation in challenging scenarios and failure in corner cases. In this study, we propose to address rad-cam association via deep representation learning, to explore feature-level interaction and global reasoning. Concretely, we design a loss sampling mechanism and an innovative ordinal loss to overcome the difficulty of imperfect labeling and to enforce critical human reasoning. Despite being trained with noisy labels generated by a rule-based algorithm, our proposed method achieves a performance of 92.2% F1 score, which is 11.6% higher than the rule-based teacher. Moreover, this data-driven method also lends itself to continuous improvement via corner case mining.
Abstract:Autonomous radar has been an integral part of advanced driver assistance systems due to its robustness to adverse weather and various lighting conditions. Conventional automotive radars use digital signal processing (DSP) algorithms to process raw data into sparse radar pins that do not provide information regarding the size and orientation of the objects. In this paper, we propose a deep-learning based algorithm for radar object detection. The algorithm takes in radar data in its raw tensor representation and places probabilistic oriented bounding boxes around the detected objects in bird's-eye-view space. We created a new multimodal dataset with 102544 frames of raw radar and synchronized LiDAR data. To reduce human annotation effort we developed a scalable pipeline to automatically annotate ground truth using LiDAR as reference. Based on this dataset we developed a vehicle detection pipeline using raw radar data as the only input. Our best performing radar detection model achieves 77.28\% AP under oriented IoU of 0.3. To the best of our knowledge, this is the first attempt to investigate object detection with raw radar data for conventional corner automotive radars.
Abstract:In sparse-view Computed Tomography (CT), only a small number of projection images are taken around the object, and sinogram interpolation method has a significant impact on final image quality. When the amount of sparsity (the amount of missing views in sinogram data) is not high, conventional interpolation methods have yielded good results. When the amount of sparsity is high, more advanced sinogram interpolation methods are needed. Recently, several deep learning (DL) based sinogram interpolation methods have been proposed. However, those DL-based methods have mostly tested so far on computer simulated sinogram data rather experimentally acquired sinogram data. In this study, we developed a sinogram interpolation method for sparse-view micro-CT based on the combination of U-Net and residual learning. We applied the method to sinogram data obtained from sparse-view micro-CT experiments, where the sparsity reached 90%. The interpolated sinogram by the DL neural network was fed to FBP algorithm for reconstruction. The result shows that both RMSE and SSIM of CT image are greatly improved. The experimental results demonstrate that this sinogram interpolation method produce significantly better results over standard linear interpolation methods when the sinogram data are extremely sparse.
Abstract:Melanoma is a type of skin cancer with the most rapidly increasing incidence. Early detection of melanoma using dermoscopy images significantly increases patients' survival rate. However, accurately classifying skin lesions, especially in the early stage, is extremely challenging via dermatologists' observation. Hence, the discovery of reliable biomarkers for melanoma diagnosis will be meaningful. Recent years, deep learning empowered computer-assisted diagnosis has been shown its value in medical imaging-based decision making. However, lots of research focus on improving disease detection accuracy but not exploring the evidence of pathology. In this paper, we propose a method to interpret the deep learning classification findings. Firstly, we propose an accurate neural network architecture to classify skin lesion. Secondly, we utilize a prediction difference analysis method that examining each patch on the image through patch wised corrupting for detecting the biomarkers. Lastly, we validate that our biomarker findings are corresponding to the patterns in the literature. The findings might be significant to guide clinical diagnosis.