Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tony Zhao

BridgeData V2: A Dataset for Robot Learning at Scale

Aug 24, 2023

Homer Walke, Kevin Black, Abraham Lee, Moo Jin Kim, Max Du, Chongyi Zheng, Tony Zhao, Philippe Hansen-Estruch, Quan Vuong, Andre He(+4 more)

Abstract:We introduce BridgeData V2, a large and diverse dataset of robotic manipulation behaviors designed to facilitate research on scalable robot learning. BridgeData V2 contains 60,096 trajectories collected across 24 environments on a publicly available low-cost robot. BridgeData V2 provides extensive task and environment variability, leading to skills that can generalize across environments, domains, and institutions, making the dataset a useful resource for a broad range of researchers. Additionally, the dataset is compatible with a wide variety of open-vocabulary, multi-task learning methods conditioned on goal images or natural language instructions. In our experiments, we train 6 state-of-the-art imitation learning and offline reinforcement learning methods on our dataset, and find that they succeed on a suite of tasks requiring varying amounts of generalization. We also demonstrate that the performance of these methods improves with more data and higher capacity models, and that training on a greater variety of skills leads to improved generalization. By publicly sharing BridgeData V2 and our pre-trained models, we aim to accelerate research in scalable robot learning methods. Project page at https://rail-berkeley.github.io/bridgedata

* 9 pages

Via

Access Paper or Ask Questions

Multi-modal Ensemble Models for Predicting Video Memorability

Feb 01, 2021

Tony Zhao, Irving Fang, Jeffrey Kim, Gerald Friedland

Figure 1 for Multi-modal Ensemble Models for Predicting Video Memorability

Figure 2 for Multi-modal Ensemble Models for Predicting Video Memorability

Figure 3 for Multi-modal Ensemble Models for Predicting Video Memorability

Abstract:Modeling media memorability has been a consistent challenge in the field of machine learning. The Predicting Media Memorability task in MediaEval2020 is the latest benchmark among similar challenges addressing this topic. Building upon techniques developed in previous iterations of the challenge, we developed ensemble methods with the use of extracted video, image, text, and audio features. Critically, in this work we introduce and demonstrate the efficacy and high generalizability of extracted audio embeddings as a feature for the task of predicting media memorability.

Via

Access Paper or Ask Questions

DIME: An Online Tool for the Visual Comparison of Cross-Modal Retrieval Models

Oct 19, 2020

Tony Zhao, Jaeyoung Choi, Gerald Friedland

Figure 1 for DIME: An Online Tool for the Visual Comparison of Cross-Modal Retrieval Models

Figure 2 for DIME: An Online Tool for the Visual Comparison of Cross-Modal Retrieval Models

Abstract:Cross-modal retrieval relies on accurate models to retrieve relevant results for queries across modalities such as image, text, and video. In this paper, we build upon previous work by tackling the difficulty of evaluating models both quantitatively and qualitatively quickly. We present DIME (Dataset, Index, Model, Embedding), a modality-agnostic tool that handles multimodal datasets, trained models, and data preprocessors to support straightforward model comparison with a web browser graphical user interface. DIME inherently supports building modality-agnostic queryable indexes and extraction of relevant feature embeddings, and thus effectively doubles as an efficient cross-modal tool to explore and search through datasets.

Via

Access Paper or Ask Questions