Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Anthony Miyaguchi

Tile-Based ViT Inference with Visual-Cluster Priors for Zero-Shot Multi-Species Plant Identification

Jul 08, 2025

Murilo Gustineli, Anthony Miyaguchi, Adrian Cheung, Divyansh Khattak

Abstract:We describe DS@GT's second-place solution to the PlantCLEF 2025 challenge on multi-species plant identification in vegetation quadrat images. Our pipeline combines (i) a fine-tuned Vision Transformer ViTD2PC24All for patch-level inference, (ii) a 4x4 tiling strategy that aligns patch size with the network's 518x518 receptive field, and (iii) domain-prior adaptation through PaCMAP + K-Means visual clustering and geolocation filtering. Tile predictions are aggregated by majority vote and re-weighted with cluster-specific Bayesian priors, yielding a macro-averaged F1 of 0.348 (private leaderboard) while requiring no additional training. All code, configuration files, and reproducibility scripts are publicly available at https://github.com/dsgt-arc/plantclef-2025.

Via

Access Paper or Ask Questions

Annotation Techniques for Judo Combat Phase Classification from Tournament Footage

Dec 10, 2024

Anthony Miyaguchi, Jed Moutahir, Tanmay Sutar

Abstract:This paper presents a semi-supervised approach to extracting and analyzing combat phases in judo tournaments using live-streamed footage. The objective is to automate the annotation and summarization of live streamed judo matches. We train models that extract relevant entities and classify combat phases from fixed-perspective judo recordings. We employ semi-supervised methods to address limited labeled data in the domain. We build a model of combat phases via transfer learning from a fine-tuned object detector to classify the presence, activity, and standing state of the match. We evaluate our approach on a dataset of 19 thirty-second judo clips, achieving an F1 score on a $20\%$ test hold-out of 0.66, 0.78, and 0.87 for the three classes, respectively. Our results show initial promise for automating more complex information retrieval tasks using rigorous methods with limited labeled data.

Via

Access Paper or Ask Questions

DS@GT eRisk 2024: Sentence Transformers for Social Media Risk Assessment

Jul 10, 2024

David Guecha, Aaryan Potdar, Anthony Miyaguchi

Abstract:We present working notes for DS@GT team in the eRisk 2024 for Tasks 1 and 3. We propose a ranking system for Task 1 that predicts symptoms of depression based on the Beck Depression Inventory (BDI-II) questionnaire using binary classifiers trained on question relevancy as a proxy for ranking. We find that binary classifiers are not well calibrated for ranking, and perform poorly during evaluation. For Task 3, we use embeddings from BERT to predict the severity of eating disorder symptoms based on user post history. We find that classical machine learning models perform well on the task, and end up competitive with the baseline models. Representation of text data is crucial in both tasks, and we find that sentence transformers are a powerful tool for downstream modeling. Source code and models are available at \url{https://github.com/dsgt-kaggle-clef/erisk-2024}.

* Paper Submitted to CLEF 2024 CEUR-WS

Via

Access Paper or Ask Questions

Fine-Grained Classification for Poisonous Fungi Identification with Transfer Learning

Jul 10, 2024

Christopher Chiu, Maximilian Heil, Teresa Kim, Anthony Miyaguchi

Figure 1 for Fine-Grained Classification for Poisonous Fungi Identification with Transfer Learning

Figure 2 for Fine-Grained Classification for Poisonous Fungi Identification with Transfer Learning

Figure 3 for Fine-Grained Classification for Poisonous Fungi Identification with Transfer Learning

Figure 4 for Fine-Grained Classification for Poisonous Fungi Identification with Transfer Learning

Abstract:FungiCLEF 2024 addresses the fine-grained visual categorization (FGVC) of fungi species, with a focus on identifying poisonous species. This task is challenging due to the size and class imbalance of the dataset, subtle inter-class variations, and significant intra-class variability amongst samples. In this paper, we document our approach in tackling this challenge through the use of ensemble classifier heads on pre-computed image embeddings. Our team (DS@GT) demonstrate that state-of-the-art self-supervised vision models can be utilized as robust feature extractors for downstream application of computer vision tasks without the need for task-specific fine-tuning on the vision backbone. Our approach achieved the best Track 3 score (0.345), accuracy (78.4%) and macro-F1 (0.577) on the private test set in post competition evaluation. Our code is available at https://github.com/dsgt-kaggle-clef/fungiclef-2024.

* Submitted and accepted into CLEF 2024 CEUR-WS proceedings

Via

Access Paper or Ask Questions

Multi-Label Plant Species Classification with Self-Supervised Vision Transformers

Jul 08, 2024

Murilo Gustineli, Anthony Miyaguchi, Ian Stalter

Abstract:We present a transfer learning approach using a self-supervised Vision Transformer (DINOv2) for the PlantCLEF 2024 competition, focusing on the multi-label plant species classification. Our method leverages both base and fine-tuned DINOv2 models to extract generalized feature embeddings. We train classifiers to predict multiple plant species within a single image using these rich embeddings. To address the computational challenges of the large-scale dataset, we employ Spark for distributed data processing, ensuring efficient memory management and processing across a cluster of workers. Our data processing pipeline transforms images into grids of tiles, classifying each tile, and aggregating these predictions into a consolidated set of probabilities. Our results demonstrate the efficacy of combining transfer learning with advanced data processing techniques for multi-label image classification tasks. Our code is available at https://github.com/dsgt-kaggle-clef/plantclef-2024.

* Paper submitted to CLEF 2024 CEUR-WS

Via

Access Paper or Ask Questions

Tile Compression and Embeddings for Multi-Label Classification in GeoLifeCLEF 2024

Jul 08, 2024

Anthony Miyaguchi, Patcharapong Aphiwetsa, Mark McDuffie

Abstract:We explore methods to solve the multi-label classification task posed by the GeoLifeCLEF 2024 competition with the DS@GT team, which aims to predict the presence and absence of plant species at specific locations using spatial and temporal remote sensing data. Our approach uses frequency-domain coefficients via the Discrete Cosine Transform (DCT) to compress and pre-compute the raw input data for convolutional neural networks. We also investigate nearest neighborhood models via locality-sensitive hashing (LSH) for prediction and to aid in the self-supervised contrastive learning of embeddings through tile2vec. Our best competition model utilized geolocation features with a leaderboard score of 0.152 and a best post-competition score of 0.161. Source code and models are available at https://github.com/dsgt-kaggle-clef/geolifeclef-2024.

* Submitted to CLEF 2024 for publication in CEUR-WS

Via

Access Paper or Ask Questions

Transfer Learning with Self-Supervised Vision Transformers for Snake Identification

Jul 08, 2024

Anthony Miyaguchi, Murilo Gustineli, Austin Fischer, Ryan Lundqvist

Abstract:We present our approach for the SnakeCLEF 2024 competition to predict snake species from images. We explore and use Meta's DINOv2 vision transformer model for feature extraction to tackle species' high variability and visual similarity in a dataset of 182,261 images. We perform exploratory analysis on embeddings to understand their structure, and train a linear classifier on the embeddings to predict species. Despite achieving a score of 39.69, our results show promise for DINOv2 embeddings in snake identification. All code for this project is available at https://github.com/dsgt-kaggle-clef/snakeclef-2024.

* Paper submitted to CLEF 2024 CEUR-WS

Via

Access Paper or Ask Questions

Transfer Learning with Pseudo Multi-Label Birdcall Classification for DS@GT BirdCLEF 2024

Jul 08, 2024

Anthony Miyaguchi, Adrian Cheung, Murilo Gustineli, Ashley Kim

Abstract:We present working notes for the DS@GT team on transfer learning with pseudo multi-label birdcall classification for the BirdCLEF 2024 competition, focused on identifying Indian bird species in recorded soundscapes. Our approach utilizes production-grade models such as the Google Bird Vocalization Classifier, BirdNET, and EnCodec to address representation and labeling challenges in the competition. We explore the distributional shift between this year's edition of unlabeled soundscapes representative of the hidden test set and propose a pseudo multi-label classification strategy to leverage the unlabeled data. Our highest post-competition public leaderboard score is 0.63 using BirdNET embeddings with Bird Vocalization pseudo-labels. Our code is available at https://github.com/dsgt-kaggle-clef/birdclef-2024

* Submitted and accepted into CLEF 2024 CEUR-WS proceedings

Via

Access Paper or Ask Questions

Transfer Learning with Semi-Supervised Dataset Annotation for Birdcall Classification

Jun 29, 2023

Anthony Miyaguchi, Nathan Zhong, Murilo Gustineli, Chris Hayduk

Figure 1 for Transfer Learning with Semi-Supervised Dataset Annotation for Birdcall Classification

Figure 2 for Transfer Learning with Semi-Supervised Dataset Annotation for Birdcall Classification

Figure 3 for Transfer Learning with Semi-Supervised Dataset Annotation for Birdcall Classification

Figure 4 for Transfer Learning with Semi-Supervised Dataset Annotation for Birdcall Classification

Abstract:We present working notes on transfer learning with semi-supervised dataset annotation for the BirdCLEF 2023 competition, focused on identifying African bird species in recorded soundscapes. Our approach utilizes existing off-the-shelf models, BirdNET and MixIT, to address representation and labeling challenges in the competition. We explore the embedding space learned by BirdNET and propose a process to derive an annotated dataset for supervised learning. Our experiments involve various models and feature engineering approaches to maximize performance on the competition leaderboard. The results demonstrate the effectiveness of our approach in classifying bird species and highlight the potential of transfer learning and semi-supervised dataset annotation in similar tasks.

* BirdCLEF working note submission to Multimedia Retrieval in Nature (LifeCLEF) for CLEF 2023

Via

Access Paper or Ask Questions

Motif Mining and Unsupervised Representation Learning for BirdCLEF 2022

Jun 08, 2022

Anthony Miyaguchi, Jiangyue Yu, Bryan Cheungvivatpant, Dakota Dudley, Aniketh Swain

Figure 1 for Motif Mining and Unsupervised Representation Learning for BirdCLEF 2022

Figure 2 for Motif Mining and Unsupervised Representation Learning for BirdCLEF 2022

Figure 3 for Motif Mining and Unsupervised Representation Learning for BirdCLEF 2022

Figure 4 for Motif Mining and Unsupervised Representation Learning for BirdCLEF 2022

Abstract:We build a classification model for the BirdCLEF 2022 challenge using unsupervised methods. We implement an unsupervised representation of the training dataset using a triplet loss on spectrogram representation of audio motifs. Our best model performs with a score of 0.48 on the public leaderboard.

* Submitted to CEUR-WS under LifeCLEF for the BirdCLEF 2022 challenge as a working note

Via

Access Paper or Ask Questions