Abstract:Data augmentation is a widely used technique for creating training data for tasks that require labeled data, such as semantic segmentation. This method benefits pixel-wise annotation tasks requiring much effort and intensive labor. Traditional data augmentation methods involve simple transformations like rotations and flips to create new images from existing ones. However, these new images may lack diversity along the main semantic axes in the data and not change high-level semantic properties. To address this issue, generative models have emerged as an effective solution for augmenting data by generating synthetic images. Controllable generative models offer a way to augment data for semantic segmentation tasks using a prompt and visual reference from the original image. However, using these models directly presents challenges, such as creating an effective prompt and visual reference to generate a synthetic image that accurately reflects the content and structure of the original. In this work, we introduce an effective data augmentation method for semantic segmentation using the Controllable Diffusion Model. Our proposed method includes efficient prompt generation using Class-Prompt Appending and Visual Prior Combination to enhance attention to labeled classes in real images. These techniques allow us to generate images that accurately depict segmented classes in the real image. In addition, we employ the class balancing algorithm to ensure efficiency when merging the synthetic and original images to generate balanced data for the training dataset. We evaluated our method on the PASCAL VOC datasets and found it highly effective for synthesizing images in semantic segmentation.
Abstract:The quest for robust Person re-identification (Re-ID) systems capable of accurately identifying subjects across diverse scenarios remains a formidable challenge in surveillance and security applications. This study presents a novel methodology that significantly enhances Person Re-Identification (Re-ID) by integrating Uncertainty Feature Fusion (UFFM) with Wise Distance Aggregation (WDA). Tested on benchmark datasets - Market-1501, DukeMTMC-ReID, and MSMT17 - our approach demonstrates substantial improvements in Rank-1 accuracy and mean Average Precision (mAP). Specifically, UFFM capitalizes on the power of feature synthesis from multiple images to overcome the limitations imposed by the variability of subject appearances across different views. WDA further refines the process by intelligently aggregating similarity metrics, thereby enhancing the system's ability to discern subtle but critical differences between subjects. The empirical results affirm the superiority of our method over existing approaches, achieving new performance benchmarks across all evaluated datasets. Code is available on Github.
Abstract:Semantic segmentation is crucial for autonomous driving, particularly for Drivable Area and Lane Segmentation, ensuring safety and navigation. To address the high computational costs of current state-of-the-art (SOTA) models, this paper introduces TwinLiteNetPlus (TwinLiteNet$^+$), a model adept at balancing efficiency and accuracy. TwinLiteNet$^+$ incorporates standard and depth-wise separable dilated convolutions, reducing complexity while maintaining high accuracy. It is available in four configurations, from the robust 1.94 million-parameter TwinLiteNet$^+_{\text{Large}}$ to the ultra-compact 34K-parameter TwinLiteNet$^+_{\text{Nano}}$. Notably, TwinLiteNet$^+_{\text{Large}}$ attains a 92.9\% mIoU for Drivable Area Segmentation and a 34.2\% IoU for Lane Segmentation. These results notably outperform those of current SOTA models while requiring a computational cost that is approximately 11 times lower in terms of Floating Point Operations (FLOPs) compared to the existing SOTA model. Extensively tested on various embedded devices, TwinLiteNet$^+$ demonstrates promising latency and power efficiency, underscoring its suitability for real-world autonomous vehicle applications.