Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Robert Wu

SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound

Jun 06, 2024

Rishit Dagli, Shivesh Prakash, Robert Wu, Houman Khosravani

Figure 1 for SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound

Figure 2 for SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound

Figure 3 for SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound

Figure 4 for SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound

Abstract:Generating combined visual and auditory sensory experiences is critical for the consumption of immersive content. Recent advances in neural generative models have enabled the creation of high-resolution content across multiple modalities such as images, text, speech, and videos. Despite these successes, there remains a significant gap in the generation of high-quality spatial audio that complements generated visual content. Furthermore, current audio generation models excel in either generating natural audio or speech or music but fall short in integrating spatial audio cues necessary for immersive experiences. In this work, we introduce SEE-2-SOUND, a zero-shot approach that decomposes the task into (1) identifying visual regions of interest; (2) locating these elements in 3D space; (3) generating mono-audio for each; and (4) integrating them into spatial audio. Using our framework, we demonstrate compelling results for generating spatial audio for high-quality videos, images, and dynamic images from the internet, as well as media generated by learned approaches.

* Project Page: https://see2sound.github.io/

Via

Access Paper or Ask Questions

Linguistic Collapse: Neural Collapse in (Large) Language Models

May 28, 2024

Robert Wu, Vardan Papyan

Figure 1 for Linguistic Collapse: Neural Collapse in (Large) Language Models

Figure 2 for Linguistic Collapse: Neural Collapse in (Large) Language Models

Figure 3 for Linguistic Collapse: Neural Collapse in (Large) Language Models

Figure 4 for Linguistic Collapse: Neural Collapse in (Large) Language Models

Abstract:Neural collapse ($\mathcal{NC}$) is a phenomenon observed in classification tasks where top-layer representations collapse into their class means, which become equinorm, equiangular and aligned with the classifiers. These behaviors -- associated with generalization and robustness -- would manifest under specific conditions: models are trained towards zero loss, with noise-free labels belonging to balanced classes, which do not outnumber the model's hidden dimension. Recent studies have explored $\mathcal{NC}$ in the absence of one or more of these conditions to extend and capitalize on the associated benefits of ideal geometries. Language modeling presents a curious frontier, as \textit{training by token prediction} constitutes a classification task where none of the conditions exist: the vocabulary is imbalanced and exceeds the embedding dimension; different tokens might correspond to similar contextual embeddings; and large language models (LLMs) in particular are typically only trained for a few epochs. This paper empirically investigates the impact of scaling the architectures and training of causal language models (CLMs) on their progression towards $\mathcal{NC}$. We find that $\mathcal{NC}$ properties that develop with scaling are linked to generalization. Moreover, there is evidence of some relationship between $\mathcal{NC}$ and generalization independent of scale. Our work therefore underscores the generality of $\mathcal{NC}$ as it extends to the novel and more challenging setting of language modeling. Downstream, we seek to inspire further research on the phenomenon to deepen our understanding of LLMs -- and neural networks at large -- and improve existing architectures based on $\mathcal{NC}$-related properties.

* 29 pages, 27 figures

Via

Access Paper or Ask Questions

Towards One Shot Search Space Poisoning in Neural Architecture Search

Nov 13, 2021

Nayan Saxena, Robert Wu, Rohan Jain

Figure 1 for Towards One Shot Search Space Poisoning in Neural Architecture Search

Figure 2 for Towards One Shot Search Space Poisoning in Neural Architecture Search

Figure 3 for Towards One Shot Search Space Poisoning in Neural Architecture Search

Figure 4 for Towards One Shot Search Space Poisoning in Neural Architecture Search

Abstract:We evaluate the robustness of a Neural Architecture Search (NAS) algorithm known as Efficient NAS (ENAS) against data agnostic poisoning attacks on the original search space with carefully designed ineffective operations. We empirically demonstrate how our one shot search space poisoning approach exploits design flaws in the ENAS controller to degrade predictive performance on classification tasks. With just two poisoning operations injected into the search space, we inflate prediction error rates for child networks upto 90% on the CIFAR-10 dataset.

* (Student Abstract) In Proceedings of the 36th AAAI Conference on Artificial Intelligence, Vancouver, BC,Canada, 2022. arXiv admin note: substantial text overlap with arXiv:2106.14406

Via

Access Paper or Ask Questions

NeuralArTS: Structuring Neural Architecture Search with Type Theory

Nov 05, 2021

Robert Wu, Nayan Saxena, Rohan Jain

Figure 1 for NeuralArTS: Structuring Neural Architecture Search with Type Theory

Figure 2 for NeuralArTS: Structuring Neural Architecture Search with Type Theory

Figure 3 for NeuralArTS: Structuring Neural Architecture Search with Type Theory

Abstract:Neural Architecture Search (NAS) algorithms automate the task of finding optimal deep learning architectures given an initial search space of possible operations. Developing these search spaces is usually a manual affair with pre-optimized search spaces being more efficient, rather than searching from scratch. In this paper we present a new framework called Neural Architecture Type System (NeuralArTS) that categorizes the infinite set of network operations in a structured type system. We further demonstrate how NeuralArTS can be applied to convolutional layers and propose several future directions.

* (Student Abstract) In Proceedings of the 36th AAAI Conference on Artificial Intelligence, Vancouver, BC,Canada, 2022

Via

Access Paper or Ask Questions

Poisoning the Search Space in Neural Architecture Search

Jun 28, 2021

Robert Wu, Nayan Saxena, Rohan Jain

Figure 1 for Poisoning the Search Space in Neural Architecture Search

Figure 2 for Poisoning the Search Space in Neural Architecture Search

Figure 3 for Poisoning the Search Space in Neural Architecture Search

Figure 4 for Poisoning the Search Space in Neural Architecture Search

Abstract:Deep learning has proven to be a highly effective problem-solving tool for object detection and image segmentation across various domains such as healthcare and autonomous driving. At the heart of this performance lies neural architecture design which relies heavily on domain knowledge and prior experience on the researchers' behalf. More recently, this process of finding the most optimal architectures, given an initial search space of possible operations, was automated by Neural Architecture Search (NAS). In this paper, we evaluate the robustness of one such algorithm known as Efficient NAS (ENAS) against data agnostic poisoning attacks on the original search space with carefully designed ineffective operations. By evaluating algorithm performance on the CIFAR-10 dataset, we empirically demonstrate how our novel search space poisoning (SSP) approach and multiple-instance poisoning attacks exploit design flaws in the ENAS controller to result in inflated prediction error rates for child networks. Our results provide insights into the challenges to surmount in using NAS for more adversarially robust architecture search.

* All authors contributed equally. Appears in AdvML Workshop @ ICML2021: A Blessing in Disguise: The Prospects and Perils of Adversarial Machine Learning

Via

Access Paper or Ask Questions