Picture for Hong-You Chen

Hong-You Chen

Contrastive Localized Language-Image Pre-Training

Add code
Oct 03, 2024
Viaarxiv icon

Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models

Add code
Oct 03, 2024
Viaarxiv icon

MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning

Add code
Sep 30, 2024
Viaarxiv icon

Fine-Tuning is Fine, if Calibrated

Add code
Sep 24, 2024
Viaarxiv icon

Lessons Learned from a Unifying Empirical Study of Parameter-Efficient Transfer Learning (PETL) in Visual Recognition

Add code
Sep 24, 2024
Figure 1 for Lessons Learned from a Unifying Empirical Study of Parameter-Efficient Transfer Learning (PETL) in Visual Recognition
Figure 2 for Lessons Learned from a Unifying Empirical Study of Parameter-Efficient Transfer Learning (PETL) in Visual Recognition
Figure 3 for Lessons Learned from a Unifying Empirical Study of Parameter-Efficient Transfer Learning (PETL) in Visual Recognition
Figure 4 for Lessons Learned from a Unifying Empirical Study of Parameter-Efficient Transfer Learning (PETL) in Visual Recognition
Viaarxiv icon

FedNE: Surrogate-Assisted Federated Neighbor Embedding for Dimensionality Reduction

Add code
Sep 17, 2024
Figure 1 for FedNE: Surrogate-Assisted Federated Neighbor Embedding for Dimensionality Reduction
Figure 2 for FedNE: Surrogate-Assisted Federated Neighbor Embedding for Dimensionality Reduction
Figure 3 for FedNE: Surrogate-Assisted Federated Neighbor Embedding for Dimensionality Reduction
Figure 4 for FedNE: Surrogate-Assisted Federated Neighbor Embedding for Dimensionality Reduction
Viaarxiv icon

SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

Add code
Jul 22, 2024
Figure 1 for SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
Figure 2 for SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
Figure 3 for SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
Figure 4 for SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
Viaarxiv icon

Jigsaw Game: Federated Clustering

Add code
Jul 17, 2024
Figure 1 for Jigsaw Game: Federated Clustering
Figure 2 for Jigsaw Game: Federated Clustering
Figure 3 for Jigsaw Game: Federated Clustering
Figure 4 for Jigsaw Game: Federated Clustering
Viaarxiv icon

Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

Add code
Apr 11, 2024
Figure 1 for Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Figure 2 for Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Figure 3 for Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Figure 4 for Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Viaarxiv icon

Bringing Back the Context: Camera Trap Species Identification as Link Prediction on Multimodal Knowledge Graphs

Add code
Jan 08, 2024
Figure 1 for Bringing Back the Context: Camera Trap Species Identification as Link Prediction on Multimodal Knowledge Graphs
Figure 2 for Bringing Back the Context: Camera Trap Species Identification as Link Prediction on Multimodal Knowledge Graphs
Figure 3 for Bringing Back the Context: Camera Trap Species Identification as Link Prediction on Multimodal Knowledge Graphs
Figure 4 for Bringing Back the Context: Camera Trap Species Identification as Link Prediction on Multimodal Knowledge Graphs
Viaarxiv icon