Picture for Haoxuan You

Haoxuan You

DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models

Add code
Nov 22, 2024
Viaarxiv icon

MM-Ego: Towards Building Egocentric Multimodal LLMs

Add code
Oct 09, 2024
Figure 1 for MM-Ego: Towards Building Egocentric Multimodal LLMs
Figure 2 for MM-Ego: Towards Building Egocentric Multimodal LLMs
Figure 3 for MM-Ego: Towards Building Egocentric Multimodal LLMs
Figure 4 for MM-Ego: Towards Building Egocentric Multimodal LLMs
Viaarxiv icon

MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning

Add code
Sep 30, 2024
Viaarxiv icon

Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions

Add code
May 23, 2024
Viaarxiv icon

Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

Add code
Apr 11, 2024
Figure 1 for Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Figure 2 for Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Figure 3 for Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Figure 4 for Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Viaarxiv icon

LLM-based Conversational AI Therapist for Daily Functioning Screening and Psychotherapeutic Intervention via Everyday Smart Devices

Add code
Mar 16, 2024
Figure 1 for LLM-based Conversational AI Therapist for Daily Functioning Screening and Psychotherapeutic Intervention via Everyday Smart Devices
Figure 2 for LLM-based Conversational AI Therapist for Daily Functioning Screening and Psychotherapeutic Intervention via Everyday Smart Devices
Figure 3 for LLM-based Conversational AI Therapist for Daily Functioning Screening and Psychotherapeutic Intervention via Everyday Smart Devices
Figure 4 for LLM-based Conversational AI Therapist for Daily Functioning Screening and Psychotherapeutic Intervention via Everyday Smart Devices
Viaarxiv icon

Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond

Add code
Oct 31, 2023
Viaarxiv icon

Ferret: Refer and Ground Anything Anywhere at Any Granularity

Add code
Oct 11, 2023
Viaarxiv icon

UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding

Add code
Jul 03, 2023
Viaarxiv icon

IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models

Add code
May 24, 2023
Viaarxiv icon