Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jack Culpepper

Training and challenging models for text-guided fashion image retrieval

Apr 23, 2022

Eric Dodds, Jack Culpepper, Gaurav Srivastava

Figure 1 for Training and challenging models for text-guided fashion image retrieval

Figure 2 for Training and challenging models for text-guided fashion image retrieval

Figure 3 for Training and challenging models for text-guided fashion image retrieval

Figure 4 for Training and challenging models for text-guided fashion image retrieval

Abstract:Retrieving relevant images from a catalog based on a query image together with a modifying caption is a challenging multimodal task that can particularly benefit domains like apparel shopping, where fine details and subtle variations may be best expressed through natural language. We introduce a new evaluation dataset, Challenging Fashion Queries (CFQ), as well as a modeling approach that achieves state-of-the-art performance on the existing Fashion IQ (FIQ) dataset. CFQ complements existing benchmarks by including relative captions with positive and negative labels of caption accuracy and conditional image similarity, where others provided only positive labels with a combined meaning. We demonstrate the importance of multimodal pretraining for the task and show that domain-specific weak supervision based on attribute labels can augment generic large-scale pretraining. While previous modality fusion mechanisms lose the benefits of multimodal pretraining, we introduce a residual attention fusion mechanism that improves performance. We release CFQ and our code to the research community.

Via

Access Paper or Ask Questions

Modality-Agnostic Attention Fusion for visual search with text feedback

Jun 30, 2020

Eric Dodds, Jack Culpepper, Simao Herdade, Yang Zhang, Kofi Boakye

Figure 1 for Modality-Agnostic Attention Fusion for visual search with text feedback

Figure 2 for Modality-Agnostic Attention Fusion for visual search with text feedback

Figure 3 for Modality-Agnostic Attention Fusion for visual search with text feedback

Figure 4 for Modality-Agnostic Attention Fusion for visual search with text feedback

Abstract:Image retrieval with natural language feedback offers the promise of catalog search based on fine-grained visual features that go beyond objects and binary attributes, facilitating real-world applications such as e-commerce. Our Modality-Agnostic Attention Fusion (MAAF) model combines image and text features and outperforms existing approaches on two visual search with modifying phrase datasets, Fashion IQ and CSS, and performs competitively on a dataset with only single-word modifications, Fashion200k. We also introduce two new challenging benchmarks adapted from Birds-to-Words and Spot-the-Diff, which provide new settings with rich language inputs, and we show that our approach without modification outperforms strong baselines. To better understand our model, we conduct detailed ablations on Fashion IQ and provide visualizations of the surprising phenomenon of words avoiding "attending" to the image region they refer to.

* 14 pages, 8 figures

Via

Access Paper or Ask Questions

Learning Embeddings for Product Visual Search with Triplet Loss and Online Sampling

Oct 10, 2018

Eric Dodds, Huy Nguyen, Simao Herdade, Jack Culpepper, Andrew Kae, Pierre Garrigues

Figure 1 for Learning Embeddings for Product Visual Search with Triplet Loss and Online Sampling

Figure 2 for Learning Embeddings for Product Visual Search with Triplet Loss and Online Sampling

Figure 3 for Learning Embeddings for Product Visual Search with Triplet Loss and Online Sampling

Figure 4 for Learning Embeddings for Product Visual Search with Triplet Loss and Online Sampling

Abstract:In this paper, we propose learning an embedding function for content-based image retrieval within the e-commerce domain using the triplet loss and an online sampling method that constructs triplets from within a minibatch. We compare our method to several strong baselines as well as recent works on the DeepFashion and Stanford Online Product datasets. Our approach significantly outperforms the state-of-the-art on the DeepFashion dataset. With a modification to favor sampling minibatches from a single product category, the same approach demonstrates competitive results when compared to the state-of-the-art for the Stanford Online Products dataset.

Via

Access Paper or Ask Questions

Deep Architectures for Face Attributes

Sep 28, 2016

Tobi Baumgartner, Jack Culpepper

Figure 1 for Deep Architectures for Face Attributes

Figure 2 for Deep Architectures for Face Attributes

Figure 3 for Deep Architectures for Face Attributes

Figure 4 for Deep Architectures for Face Attributes

Abstract:We train a deep convolutional neural network to perform identity classification using a new dataset of public figures annotated with age, gender, ethnicity and emotion labels, and then fine-tune it for attribute classification. An optimal sharing pattern of computational resources within this network is determined by experiment, requiring only 1 G flops to produce all predictions. Rather than fine-tune by relearning weights in one additional layer after the penultimate layer of the identity network, we try several different depths for each attribute. We find that prediction of age and emotion is improved by fine-tuning from earlier layers onward, presumably because deeper layers are progressively invariant to non-identity related changes in the input.

* 11 pages, 2 figures, accepted in "Workshop on Facial Informatics in conjunction with ACCV '16"

Via

Access Paper or Ask Questions