Picture for Zixian Ma

Zixian Ma

TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action

Add code
Dec 10, 2024
Viaarxiv icon

ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models

Add code
Dec 09, 2024
Viaarxiv icon

NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples

Add code
Oct 18, 2024
Figure 1 for NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples
Figure 2 for NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples
Figure 3 for NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples
Figure 4 for NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples
Viaarxiv icon

Task Me Anything

Add code
Jun 17, 2024
Figure 1 for Task Me Anything
Figure 2 for Task Me Anything
Figure 3 for Task Me Anything
Figure 4 for Task Me Anything
Viaarxiv icon

m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks

Add code
Mar 21, 2024
Viaarxiv icon

SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality

Add code
Jun 26, 2023
Viaarxiv icon

Model Sketching: Centering Concepts in Early-Stage Machine Learning Model Design

Add code
Mar 06, 2023
Figure 1 for Model Sketching: Centering Concepts in Early-Stage Machine Learning Model Design
Figure 2 for Model Sketching: Centering Concepts in Early-Stage Machine Learning Model Design
Figure 3 for Model Sketching: Centering Concepts in Early-Stage Machine Learning Model Design
Figure 4 for Model Sketching: Centering Concepts in Early-Stage Machine Learning Model Design
Viaarxiv icon

CREPE: Can Vision-Language Foundation Models Reason Compositionally?

Add code
Dec 13, 2022
Viaarxiv icon

ELIGN: Expectation Alignment as a Multi-Agent Intrinsic Reward

Add code
Oct 09, 2022
Figure 1 for ELIGN: Expectation Alignment as a Multi-Agent Intrinsic Reward
Figure 2 for ELIGN: Expectation Alignment as a Multi-Agent Intrinsic Reward
Figure 3 for ELIGN: Expectation Alignment as a Multi-Agent Intrinsic Reward
Figure 4 for ELIGN: Expectation Alignment as a Multi-Agent Intrinsic Reward
Viaarxiv icon

MobilePhys: Personalized Mobile Camera-Based Contactless Physiological Sensing

Add code
Jan 11, 2022
Figure 1 for MobilePhys: Personalized Mobile Camera-Based Contactless Physiological Sensing
Figure 2 for MobilePhys: Personalized Mobile Camera-Based Contactless Physiological Sensing
Figure 3 for MobilePhys: Personalized Mobile Camera-Based Contactless Physiological Sensing
Figure 4 for MobilePhys: Personalized Mobile Camera-Based Contactless Physiological Sensing
Viaarxiv icon