Abstract:Data augmentation has long been a cornerstone for reducing overfitting in vision models, with methods like AutoAugment automating the design of task-specific augmentations. Recent advances in generative models, such as conditional diffusion and few-shot NeRFs, offer a new paradigm for data augmentation by synthesizing data with significantly greater diversity and realism. However, unlike traditional augmentations like cropping or rotation, these methods introduce substantial changes that enhance robustness but also risk degrading performance if the augmentations are poorly matched to the task. In this work, we present EvoAug, an automated augmentation learning pipeline, which leverages these generative models alongside an efficient evolutionary algorithm to learn optimal task-specific augmentations. Our pipeline introduces a novel approach to image augmentation that learns stochastic augmentation trees that hierarchically compose augmentations, enabling more structured and adaptive transformations. We demonstrate strong performance across fine-grained classification and few-shot learning tasks. Notably, our pipeline discovers augmentations that align with domain knowledge, even in low-data settings. These results highlight the potential of learned generative augmentations, unlocking new possibilities for robust model training.




Abstract:Preventing Veteran suicide is a national priority. The US Department of Veterans Affairs (VA) collects, analyzes, and publishes data to inform suicide prevention strategies. Current approaches for detecting suicidal ideation mostly rely on patient self report which are inadequate and time consuming. In this research study, our goal was to automate suicidal ideation detection from acoustic and linguistic features of an individual's speech using machine learning (ML) algorithms. Using voice data collected from Veterans enrolled in a large interventional study on Gulf War Illness at the Washington DC VA Medical Center, we conducted an evaluation of the performance of different ML approaches in achieving our objective. By fitting both classical ML and deep learning models to the dataset, we identified the algorithms that were most effective for each feature set. Among classical machine learning algorithms, the Support Vector Machine (SVM) trained on acoustic features performed best in classifying suicidal Veterans. Among deep learning methods, the Convolutional Neural Network (CNN) trained on the linguistic features performed best. Our study shows that speech analysis in a machine learning pipeline is a promising approach for detecting suicidality among Veterans.