Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Arnold Wiliem

3rd Workshop on Maritime Computer Vision (MaCVi) 2025: Challenge Results

Jan 17, 2025

Benjamin Kiefer, Lojze Žust, Jon Muhovič, Matej Kristan, Janez Perš, Matija Teršek, Uma Mudenagudi Chaitra Desai, Arnold Wiliem, Marten Kreis, Nikhil Akalwadi(+36 more)

Abstract:The 3rd Workshop on Maritime Computer Vision (MaCVi) 2025 addresses maritime computer vision for Unmanned Surface Vehicles (USV) and underwater. This report offers a comprehensive overview of the findings from the challenges. We provide both statistical and qualitative analyses, evaluating trends from over 700 submissions. All datasets, evaluation code, and the leaderboard are available to the public at https://macvi.org/workshop/macvi25.

* Part of the MaCVi 2025 workshop

Via

Access Paper or Ask Questions

Zoom-shot: Fast and Efficient Unsupervised Zero-Shot Transfer of CLIP to Vision Encoders with Multimodal Loss

Jan 22, 2024

Jordan Shipard, Arnold Wiliem, Kien Nguyen Thanh, Wei Xiang, Clinton Fookes

Abstract:The fusion of vision and language has brought about a transformative shift in computer vision through the emergence of Vision-Language Models (VLMs). However, the resource-intensive nature of existing VLMs poses a significant challenge. We need an accessible method for developing the next generation of VLMs. To address this issue, we propose Zoom-shot, a novel method for transferring the zero-shot capabilities of CLIP to any pre-trained vision encoder. We do this by exploiting the multimodal information (i.e. text and image) present in the CLIP latent space through the use of specifically designed multimodal loss functions. These loss functions are (1) cycle-consistency loss and (2) our novel prompt-guided knowledge distillation loss (PG-KD). PG-KD combines the concept of knowledge distillation with CLIP's zero-shot classification, to capture the interactions between text and image features. With our multimodal losses, we train a $\textbf{linear mapping}$ between the CLIP latent space and the latent space of a pre-trained vision encoder, for only a $\textbf{single epoch}$. Furthermore, Zoom-shot is entirely unsupervised and is trained using $\textbf{unpaired}$ data. We test the zero-shot capabilities of a range of vision encoders augmented as new VLMs, on coarse and fine-grained classification datasets, outperforming the previous state-of-the-art in this problem domain. In our ablations, we find Zoom-shot allows for a trade-off between data and compute during training; and our state-of-the-art results can be obtained by reducing training from 20% to 1% of the ImageNet training data with 20 epochs. All code and models are available on GitHub.

* 15 pages

Via

Access Paper or Ask Questions

SafeSea: Synthetic Data Generation for Adverse & Low Probability Maritime Conditions

Nov 24, 2023

Martin Tran, Jordan Shipard, Hermawan Mulyono, Arnold Wiliem, Clinton Fookes

Figure 1 for SafeSea: Synthetic Data Generation for Adverse & Low Probability Maritime Conditions

Figure 2 for SafeSea: Synthetic Data Generation for Adverse & Low Probability Maritime Conditions

Figure 3 for SafeSea: Synthetic Data Generation for Adverse & Low Probability Maritime Conditions

Figure 4 for SafeSea: Synthetic Data Generation for Adverse & Low Probability Maritime Conditions

Abstract:High-quality training data is essential for enhancing the robustness of object detection models. Within the maritime domain, obtaining a diverse real image dataset is particularly challenging due to the difficulty of capturing sea images with the presence of maritime objects , especially in stormy conditions. These challenges arise due to resource limitations, in addition to the unpredictable appearance of maritime objects. Nevertheless, acquiring data from stormy conditions is essential for training effective maritime detection models, particularly for search and rescue, where real-world conditions can be unpredictable. In this work, we introduce SafeSea, which is a stepping stone towards transforming actual sea images with various Sea State backgrounds while retaining maritime objects. Compared to existing generative methods such as Stable Diffusion Inpainting~\cite{stableDiffusion}, this approach reduces the time and effort required to create synthetic datasets for training maritime object detection models. The proposed method uses two automated filters to only pass generated images that meet the criteria. In particular, these filters will first classify the sea condition according to its Sea State level and then it will check whether the objects from the input image are still preserved. This method enabled the creation of the SafeSea dataset, offering diverse weather condition backgrounds to supplement the training of maritime models. Lastly, we observed that a maritime object detection model faced challenges in detecting objects in stormy sea backgrounds, emphasizing the impact of weather conditions on detection accuracy. The code, and dataset are available at https://github.com/martin-3240/SafeSea.

* Accepted to WACV 2024 workshop on Maritime Computer Vision

Via

Access Paper or Ask Questions

The 2nd Workshop on Maritime Computer Vision 2024

Nov 23, 2023

Benjamin Kiefer, Lojze Žust, Matej Kristan, Janez Perš, Matija Teršek, Arnold Wiliem, Martin Messmer, Cheng-Yen Yang, Hsiang-Wei Huang, Zhongyu Jiang(+39 more)

Figure 1 for The 2nd Workshop on Maritime Computer Vision 2024

Figure 2 for The 2nd Workshop on Maritime Computer Vision 2024

Figure 3 for The 2nd Workshop on Maritime Computer Vision 2024

Figure 4 for The 2nd Workshop on Maritime Computer Vision 2024

Abstract:The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024 addresses maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicles (USV). Three challenges categories are considered: (i) UAV-based Maritime Object Tracking with Re-identification, (ii) USV-based Maritime Obstacle Segmentation and Detection, (iii) USV-based Maritime Boat Tracking. The USV-based Maritime Obstacle Segmentation and Detection features three sub-challenges, including a new embedded challenge addressing efficicent inference on real-world embedded devices. This report offers a comprehensive overview of the findings from the challenges. We provide both statistical and qualitative analyses, evaluating trends from over 195 submissions. All datasets, evaluation code, and the leaderboard are available to the public at https://macvi.org/workshop/macvi24.

* Part of 2nd Workshop on Maritime Computer Vision (MaCVi) 2024 IEEE Xplore submission as part of WACV 2024

Via

Access Paper or Ask Questions

Boosting Zero-shot Classification with Synthetic Data Diversity via Stable Diffusion

Feb 08, 2023

Jordan Shipard, Arnold Wiliem, Kien Nguyen Thanh, Wei Xiang, Clinton Fookes

Figure 1 for Boosting Zero-shot Classification with Synthetic Data Diversity via Stable Diffusion

Figure 2 for Boosting Zero-shot Classification with Synthetic Data Diversity via Stable Diffusion

Figure 3 for Boosting Zero-shot Classification with Synthetic Data Diversity via Stable Diffusion

Figure 4 for Boosting Zero-shot Classification with Synthetic Data Diversity via Stable Diffusion

Abstract:Recent research has shown it is possible to perform zero-shot classification tasks by training a classifier with synthetic data generated by a diffusion model. However, the performance of this approach is still inferior to that of recent vision-language models. It has been suggested that the reason for this is a domain gap between the synthetic and real data. In our work, we show that this domain gap is not the main issue, and that diversity in the synthetic dataset is more important. We propose a $\textit{bag of tricks}$ to improve diversity and are able to achieve performance on par with one of the vision-language models, CLIP. More importantly, this insight allows us to endow zero-shot classification capabilities on any classification model.

* (7 pages, 3 figures, 2 tables, preprint)

Via

Access Paper or Ask Questions

Does Interference Exist When Training a Once-For-All Network?

Apr 20, 2022

Jordan Shipard, Arnold Wiliem, Clinton Fookes

Figure 1 for Does Interference Exist When Training a Once-For-All Network?

Figure 2 for Does Interference Exist When Training a Once-For-All Network?

Figure 3 for Does Interference Exist When Training a Once-For-All Network?

Figure 4 for Does Interference Exist When Training a Once-For-All Network?

Abstract:The Once-For-All (OFA) method offers an excellent pathway to deploy a trained neural network model into multiple target platforms by utilising the supernet-subnet architecture. Once trained, a subnet can be derived from the supernet (both architecture and trained weights) and deployed directly to the target platform with little to no retraining or fine-tuning. To train the subnet population, OFA uses a novel training method called Progressive Shrinking (PS) which is designed to limit the negative impact of interference during training. It is believed that higher interference during training results in lower subnet population accuracies. In this work we take a second look at this interference effect. Surprisingly, we find that interference mitigation strategies do not have a large impact on the overall subnet population performance. Instead, we find the subnet architecture selection bias during training to be a more important aspect. To show this, we propose a simple-yet-effective method called Random Subnet Sampling (RSS), which does not have mitigation on the interference effect. Despite no mitigation, RSS is able to produce a better performing subnet population than PS in four small-to-medium-sized datasets; suggesting that the interference effect does not play a pivotal role in these datasets. Due to its simplicity, RSS provides a $1.9\times$ reduction in training times compared to PS. A $6.1\times$ reduction can also be achieved with a reasonable drop in performance when the number of RSS training epochs are reduced. Code available at https://github.com/Jordan-HS/RSS-Interference-CVPRW2022.

* Accepted to CVPR Embedded Vision Workshop 2022

Via

Access Paper or Ask Questions

Unsupervised Domain Adaptive Object Detection using Forward-Backward Cyclic Adaptation

Feb 03, 2020

Siqi Yang, Lin Wu, Arnold Wiliem, Brian C. Lovell

Figure 1 for Unsupervised Domain Adaptive Object Detection using Forward-Backward Cyclic Adaptation

Figure 2 for Unsupervised Domain Adaptive Object Detection using Forward-Backward Cyclic Adaptation

Figure 3 for Unsupervised Domain Adaptive Object Detection using Forward-Backward Cyclic Adaptation

Figure 4 for Unsupervised Domain Adaptive Object Detection using Forward-Backward Cyclic Adaptation

Abstract:We present a novel approach to perform the unsupervised domain adaptation for object detection through forward-backward cyclic (FBC) training. Recent adversarial training based domain adaptation methods have shown their effectiveness on minimizing domain discrepancy via marginal feature distributions alignment. However, aligning the marginal feature distributions does not guarantee the alignment of class conditional distributions. This limitation is more evident when adapting object detectors as the domain discrepancy is larger compared to the image classification task, e.g. various number of objects exist in one image and the majority of content in an image is the background. This motivates us to learn domain invariance for category level semantics via gradient alignment. Intuitively, if the gradients of two domains point in similar directions, then the learning of one domain can improve that of another domain. To achieve gradient alignment, we propose Forward-Backward Cyclic Adaptation, which iteratively computes adaptation from source to target via backward hopping and from target to source via forward passing. In addition, we align low-level features for adapting holistic color/texture via adversarial training. However, the detector performs well on both domains is not ideal for target domain. As such, in each cycle, domain diversity is enforced by maximum entropy regularization on the source domain to penalize confident source-specific learning and minimum entropy regularization on target domain to intrigue target-specific learning. Theoretical analysis of the training process is provided, and extensive experiments on challenging cross-domain object detection datasets have shown the superiority of our approach over the state-of-the-art.

Via

Access Paper or Ask Questions

To What Extent Does Downsampling, Compression, and Data Scarcity Impact Renal Image Analysis?

Sep 22, 2019

Can Peng, Kun Zhao, Arnold Wiliem, Teng Zhang, Peter Hobson, Anthony Jennings, Brian C. Lovell

Figure 1 for To What Extent Does Downsampling, Compression, and Data Scarcity Impact Renal Image Analysis?

Figure 2 for To What Extent Does Downsampling, Compression, and Data Scarcity Impact Renal Image Analysis?

Figure 3 for To What Extent Does Downsampling, Compression, and Data Scarcity Impact Renal Image Analysis?

Figure 4 for To What Extent Does Downsampling, Compression, and Data Scarcity Impact Renal Image Analysis?

Abstract:The condition of the Glomeruli, or filter sacks, in renal Direct Immunofluorescence (DIF) specimens is a critical indicator for diagnosing kidney diseases. A digital pathology system which digitizes a glass histology slide into a Whole Slide Image (WSI) and then automatically detects and zooms in on the glomeruli with a higher magnification objective will be extremely helpful for pathologists. In this paper, using glomerulus detection as the study case, we provide analysis and observations on several important issues to help with the development of Computer Aided Diagnostic (CAD) systems to process WSIs. Large image resolution, large file size, and data scarcity are always challenging to deal with. To this end, we first examine image downsampling rates in terms of their effect on detection accuracy. Second, we examine the impact of image compression. Third, we examine the relationship between the size of the training set and detection accuracy. To understand the above issues, experiments are performed on the state-of-the-art detectors: Faster R-CNN, R-FCN, Mask R-CNN and SSD. Critical findings are observed: (1) The best balance between detection accuracy, detection speed and file size is achieved at 8 times downsampling captured with a $40\times$ objective; (2) compression which reduces the file size dramatically, does not necessarily have an adverse effect on overall accuracy; (3) reducing the amount of training data to some extents causes a drop in precision but has a negligible impact on the recall; (4) in most cases, Faster R-CNN achieves the best accuracy in the glomerulus detection task. We show that the image file size of $40\times$ WSI images can be reduced by a factor of over 6000 with negligible loss of glomerulus detection accuracy.

Via

Access Paper or Ask Questions

Deep inspection: an electrical distribution pole parts study via deep neural networks

Jul 16, 2019

Liangchen Liu, Teng Zhang, Kun Zhao, Arnold Wiliem, Kieren Astin-Walmsley, Brian Lovell

Figure 1 for Deep inspection: an electrical distribution pole parts study via deep neural networks

Figure 2 for Deep inspection: an electrical distribution pole parts study via deep neural networks

Figure 3 for Deep inspection: an electrical distribution pole parts study via deep neural networks

Figure 4 for Deep inspection: an electrical distribution pole parts study via deep neural networks

Abstract:Electrical distribution poles are important assets in electricity supply. These poles need to be maintained in good condition to ensure they protect community safety, maintain reliability of supply, and meet legislative obligations. However, maintaining such a large volumes of assets is an expensive and challenging task. To address this, recent approaches utilise imagery data captured from helicopter and/or drone inspections. Whilst reducing the cost for manual inspection, manual analysis on each image is still required. As such, several image-based automated inspection systems have been proposed. In this paper, we target two major challenges: tiny object detection and extremely imbalanced datasets, which currently hinder the wide deployment of the automatic inspection. We propose a novel two-stage zoom-in detection method to gradually focus on the object of interest. To address the imbalanced dataset problem, we propose the resampling as well as reweighting schemes to iteratively adapt the model to the large intra-class variation of major class and balance the contributions to the loss from each class. Finally, we integrate these components together and devise a novel automatic inspection framework. Extensive experiments demonstrate that our proposed approaches are effective and can boost the performance compared to the baseline methods.

* electrical distribution pole inspection, integrated inspection system, object detection, imbalanced data classification, To appear in Proceeding of ICIP 2019

Via

Access Paper or Ask Questions

Deep Instance-Level Hard Negative Mining Model for Histopathology Images

Jun 27, 2019

Meng Li, Lin Wu, Arnold Wiliem, Kun Zhao, Teng Zhang, Brian C. Lovell

Figure 1 for Deep Instance-Level Hard Negative Mining Model for Histopathology Images

Figure 2 for Deep Instance-Level Hard Negative Mining Model for Histopathology Images

Figure 3 for Deep Instance-Level Hard Negative Mining Model for Histopathology Images

Figure 4 for Deep Instance-Level Hard Negative Mining Model for Histopathology Images

Abstract:Histopathology image analysis can be considered as a Multiple instance learning (MIL) problem, where the whole slide histopathology image (WSI) is regarded as a bag of instances (i.e, patches) and the task is to predict a single class label to the WSI. However, in many real-life applications such as computational pathology, discovering the key instances that trigger the bag label is of great interest because it provides reasons for the decision made by the system. In this paper, we propose a deep convolutional neural network (CNN) model that addresses the primary task of a bag classification on a WSI and also learns to identify the response of each instance to provide interpretable results to the final prediction. We incorporate the attention mechanism into the proposed model to operate the transformation of instances and learn attention weights to allow us to find key patches. To perform a balanced training, we introduce adaptive weighing in each training bag to explicitly adjust the weight distribution in order to concentrate more on the contribution of hard samples. Based on the learned attention weights, we further develop a solution to boost the classification performance by generating the bags with hard negative instances. We conduct extensive experiments on colon and breast cancer histopathology data and show that our framework achieves state-of-the-art performance.

* Accepted by MICCAI 2019

Via

Access Paper or Ask Questions