Fine Grained Visual Categorization


Cross-Hierarchical Bidirectional Consistency Learning for Fine-Grained Visual Classification

Add code
Apr 18, 2025
Viaarxiv icon

Car-1000: A New Large Scale Fine-Grained Visual Categorization Dataset

Add code
Mar 16, 2025
Viaarxiv icon

Augmenting Image Annotation: A Human-LMM Collaborative Framework for Efficient Object Selection and Label Generation

Add code
Mar 14, 2025
Viaarxiv icon

A Comprehensive Survey on Generative AI for Video-to-Music Generation

Add code
Feb 18, 2025
Viaarxiv icon

WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs

Add code
Feb 06, 2025
Figure 1 for WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs
Figure 2 for WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs
Figure 3 for WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs
Figure 4 for WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs
Viaarxiv icon

Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs

Add code
Jan 08, 2025
Viaarxiv icon

SAFE-MEME: Structured Reasoning Framework for Robust Hate Speech Detection in Memes

Add code
Dec 29, 2024
Viaarxiv icon

L-WISE: Boosting Human Image Category Learning Through Model-Based Image Selection And Enhancement

Add code
Dec 12, 2024
Viaarxiv icon

GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks

Add code
Nov 28, 2024
Figure 1 for GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks
Figure 2 for GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks
Figure 3 for GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks
Figure 4 for GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks
Viaarxiv icon

To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimodal Large Language Models

Add code
Oct 09, 2024
Figure 1 for To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimodal Large Language Models
Figure 2 for To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimodal Large Language Models
Figure 3 for To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimodal Large Language Models
Figure 4 for To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimodal Large Language Models
Viaarxiv icon