Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Garrett Bingham

HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning

Jul 22, 2024

Zhecan Wang, Garrett Bingham, Adams Yu, Quoc Le, Thang Luong, Golnaz Ghiasi

Abstract:Hallucination has been a major problem for large language models and remains a critical challenge when it comes to multimodality in which vision-language models (VLMs) have to deal with not just textual but also visual inputs. Despite rapid progress in VLMs, resources for evaluating and addressing multimodal hallucination are limited and mostly focused on evaluation. This work introduces HaloQuest, a novel visual question answering dataset that captures various aspects of multimodal hallucination such as false premises, insufficient contexts, and visual challenges. A novel idea from HaloQuest is to leverage synthetic images, apart from real ones, to enable dataset creation at scale. With over 7.7K examples spanning across a wide variety of categories, HaloQuest was designed to be both a challenging benchmark for VLMs and a fine-tuning dataset for advancing multimodal reasoning. Our experiments reveal that current models struggle with HaloQuest, with all open-source VLMs achieving below 36% accuracy. On the other hand, fine-tuning on HaloQuest significantly reduces hallucination rates while preserving performance on standard reasoning tasks. Our results discover that benchmarking with generated images is highly correlated (r=0.97) with real images. Last but not least, we propose a novel Auto-Eval mechanism that is highly correlated with human raters (r=0.99) for evaluating VLMs. In sum, this work makes concrete strides towards understanding, evaluating, and mitigating hallucination in VLMs, serving as an important step towards more reliable multimodal AI systems in the future.

* Accepted as a main conference paper at ECCV 2024 (https://github.com/google/haloquest)

Via

Access Paper or Ask Questions

Optimizing Neural Networks through Activation Function Discovery and Automatic Weight Initialization

Apr 06, 2023

Garrett Bingham

Abstract:Automated machine learning (AutoML) methods improve upon existing models by optimizing various aspects of their design. While present methods focus on hyperparameters and neural network topologies, other aspects of neural network design can be optimized as well. To further the state of the art in AutoML, this dissertation introduces techniques for discovering more powerful activation functions and establishing more robust weight initialization for neural networks. These contributions improve performance, but also provide new perspectives on neural network optimization. First, the dissertation demonstrates that discovering solutions specialized to specific architectures and tasks gives better performance than reusing general approaches. Second, it shows that jointly optimizing different components of neural networks is synergistic, and results in better performance than optimizing individual components alone. Third, it demonstrates that learned representations are easier to optimize than hard-coded ones, creating further opportunities for AutoML. The dissertation thus makes concrete progress towards fully automatic machine learning in the future.

* PhD Dissertation

Via

Access Paper or Ask Questions

Efficient Activation Function Optimization through Surrogate Modeling

Jan 13, 2023

Garrett Bingham, Risto Miikkulainen

Abstract:Carefully designed activation functions can improve the performance of neural networks in many machine learning tasks. However, it is difficult for humans to construct optimal activation functions, and current activation function search algorithms are prohibitively expensive. This paper aims to improve the state of the art through three steps: First, the benchmark datasets Act-Bench-CNN, Act-Bench-ResNet, and Act-Bench-ViT were created by training convolutional, residual, and vision transformer architectures from scratch with 2,913 systematically generated activation functions. Second, a characterization of the benchmark space was developed, leading to a new surrogate-based method for optimization. More specifically, the spectrum of the Fisher information matrix associated with the model's predictive distribution at initialization and the activation function's output distribution were found to be highly predictive of performance. Third, the surrogate was used to discover improved activation functions in CIFAR-100 and ImageNet tasks. Each of these steps is a contribution in its own right; together they serve as a practical and theoretical foundation for further research on activation function optimization. Code is available at https://github.com/cognizant-ai-labs/aquasurf, and the benchmark datasets are at https://github.com/cognizant-ai-labs/act-bench.

* 18 pages, 10 figures, 3 tables

Via

Access Paper or Ask Questions

AutoInit: Analytic Signal-Preserving Weight Initialization for Neural Networks

Sep 18, 2021

Garrett Bingham, Risto Miikkulainen

Figure 1 for AutoInit: Analytic Signal-Preserving Weight Initialization for Neural Networks

Figure 2 for AutoInit: Analytic Signal-Preserving Weight Initialization for Neural Networks

Figure 3 for AutoInit: Analytic Signal-Preserving Weight Initialization for Neural Networks

Figure 4 for AutoInit: Analytic Signal-Preserving Weight Initialization for Neural Networks

Abstract:Neural networks require careful weight initialization to prevent signals from exploding or vanishing. Existing initialization schemes solve this problem in specific cases by assuming that the network has a certain activation function or topology. It is difficult to derive such weight initialization strategies, and modern architectures therefore often use these same initialization schemes even though their assumptions do not hold. This paper introduces AutoInit, a weight initialization algorithm that automatically adapts to different neural network architectures. By analytically tracking the mean and variance of signals as they propagate through the network, AutoInit is able to appropriately scale the weights at each layer to avoid exploding or vanishing signals. Experiments demonstrate that AutoInit improves performance of various convolutional and residual networks across a range of activation function, dropout, weight decay, learning rate, and normalizer settings. Further, in neural architecture search and activation function meta-learning, AutoInit automatically calculates specialized weight initialization strategies for thousands of unique architectures and hundreds of unique activation functions, and improves performance in vision, language, tabular, multi-task, and transfer learning scenarios. AutoInit thus serves as an automatic configuration tool that makes design of new neural network architectures more robust. The AutoInit package provides a wrapper around existing TensorFlow models and is available at https://github.com/cognizant-ai-labs/autoinit.

* 15 pages, 9 figures, 1 table

Via

Access Paper or Ask Questions

Discovering Parametric Activation Functions

Jun 05, 2020

Garrett Bingham, Risto Miikkulainen

Figure 1 for Discovering Parametric Activation Functions

Figure 2 for Discovering Parametric Activation Functions

Figure 3 for Discovering Parametric Activation Functions

Figure 4 for Discovering Parametric Activation Functions

Abstract:Recent studies have shown that the choice of activation function can significantly affect the performance of deep learning networks. However, the benefits of novel activation functions have been inconsistent and task-dependent, and therefore the rectified linear unit (ReLU) is still the most commonly used. This paper proposes a technique for customizing activation functions automatically, resulting in reliable improvements in performance. Evolutionary search is used to discover the general form of the function, and gradient descent to optimize its parameters for different parts of the network and over the learning process. Experiments with three different neural network architectures on the CIFAR-100 image classification dataset show that this approach is effective. It discovers different activation functions for different architectures, and consistently improves accuracy over ReLU and other recently proposed activation functions by significant margins. The approach can therefore be used as an automated optimization step in applying deep learning to new tasks.

* 11 pages, 6 figures/tables, under review

Via

Access Paper or Ask Questions

Evolutionary Optimization of Deep Learning Activation Functions

Feb 17, 2020

Garrett Bingham, William Macke, Risto Miikkulainen

Figure 1 for Evolutionary Optimization of Deep Learning Activation Functions

Figure 2 for Evolutionary Optimization of Deep Learning Activation Functions

Figure 3 for Evolutionary Optimization of Deep Learning Activation Functions

Figure 4 for Evolutionary Optimization of Deep Learning Activation Functions

Abstract:The choice of activation function can have a large effect on the performance of a neural network. While there have been some attempts to hand-engineer novel activation functions, the Rectified Linear Unit (ReLU) remains the most commonly-used in practice. This paper shows that evolutionary algorithms can discover novel activation functions that outperform ReLU. A tree-based search space of candidate activation functions is defined and explored with mutation, crossover, and exhaustive search. Experiments on training wide residual networks on the CIFAR-10 and CIFAR-100 image datasets show that this approach is effective. Replacing ReLU with evolved activation functions results in statistically significant increases in network accuracy. Optimal performance is achieved when evolution is allowed to customize activation functions to a particular task; however, these novel activation functions are shown to generalize, achieving high performance across tasks. Evolutionary optimization of activation functions is therefore a promising new dimension of metalearning in neural networks.

* 8 pages; 9 figures/tables; submitted to GECCO 2020

Via

Access Paper or Ask Questions

Improving Low-Resource Cross-lingual Document Retrieval by Reranking with Deep Bilingual Representations

Jun 08, 2019

Rui Zhang, Caitlin Westerfield, Sungrok Shim, Garrett Bingham, Alexander Fabbri, Neha Verma, William Hu, Dragomir Radev

Figure 1 for Improving Low-Resource Cross-lingual Document Retrieval by Reranking with Deep Bilingual Representations

Figure 2 for Improving Low-Resource Cross-lingual Document Retrieval by Reranking with Deep Bilingual Representations

Figure 3 for Improving Low-Resource Cross-lingual Document Retrieval by Reranking with Deep Bilingual Representations

Figure 4 for Improving Low-Resource Cross-lingual Document Retrieval by Reranking with Deep Bilingual Representations

Abstract:In this paper, we propose to boost low-resource cross-lingual document retrieval performance with deep bilingual query-document representations. We match queries and documents in both source and target languages with four components, each of which is implemented as a term interaction-based deep neural network with cross-lingual word embeddings as input. By including query likelihood scores as extra features, our model effectively learns to rerank the retrieved documents by using a small number of relevance labels for low-resource language pairs. Due to the shared cross-lingual word embedding space, the model can also be directly applied to another language pair without any training label. Experimental results on the MATERIAL dataset show that our model outperforms the competitive translation-based baselines on English-Swahili, English-Tagalog, and English-Somali cross-lingual information retrieval tasks.

* ACL 2019, short paper

Via

Access Paper or Ask Questions

Preliminary Studies on a Large Face Database

Nov 15, 2018

Benjamin Yip, Garrett Bingham, Katherine Kempfert, Jonathan Fabish, Troy Kling, Cuixian Chen, Yishi Wang

Figure 1 for Preliminary Studies on a Large Face Database

Figure 2 for Preliminary Studies on a Large Face Database

Figure 3 for Preliminary Studies on a Large Face Database

Figure 4 for Preliminary Studies on a Large Face Database

Abstract:We perform preliminary studies on a large longitudinal face database MORPH-II, which is a benchmark dataset in the field of computer vision and pattern recognition. First, we summarize the inconsistencies in the dataset and introduce the steps and strategy taken for cleaning. The potential implications of these inconsistencies on prior research are introduced. Next, we propose a new automatic subsetting scheme for evaluation protocol. It is intended to overcome the unbalanced racial and gender distributions of MORPH-II, while ensuring independence between training and testing sets. Finally, we contribute a novel global framework for age estimation that utilizes posterior probabilities from the race classification step to compute a racecomposite age estimate. Preliminary experimental results on MORPH-II are presented.

* It has been accepted in the 5th National Symposium for NSF REU Research in Data Science, Systems, and Security. G. Bingham and K. Kempfert contributed equally

Via

Access Paper or Ask Questions

Random Subspace Two-dimensional LDA for Face Recognition

Nov 02, 2017

Garrett Bingham

Figure 1 for Random Subspace Two-dimensional LDA for Face Recognition

Figure 2 for Random Subspace Two-dimensional LDA for Face Recognition

Figure 3 for Random Subspace Two-dimensional LDA for Face Recognition

Figure 4 for Random Subspace Two-dimensional LDA for Face Recognition

Abstract:In this paper, a novel technique named random subspace two-dimensional LDA (RS-2DLDA) is developed for face recognition. This approach offers a number of improvements over the random subspace two-dimensional PCA (RS2DPCA) framework introduced by Nguyen et al. [5]. Firstly, the eigenvectors from 2DLDA have more discriminative power than those from 2DPCA, resulting in higher accuracy for the RS-2DLDA method over RS-2DPCA. Various distance metrics are evaluated, and a weighting scheme is developed to further boost accuracy. A series of experiments on the MORPH-II and ORL datasets are conducted to demonstrate the effectiveness of this approach.

Via

Access Paper or Ask Questions