Picture for Haozhe Zhao

Haozhe Zhao

Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance

Add code
Nov 21, 2024
Viaarxiv icon

Selecting Influential Samples for Long Context Alignment via Homologous Models' Guidance and Contextual Awareness Measurement

Add code
Oct 21, 2024
Viaarxiv icon

A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation

Add code
Oct 02, 2024
Figure 1 for A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation
Figure 2 for A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation
Figure 3 for A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation
Figure 4 for A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation
Viaarxiv icon

Rethinking Semantic Parsing for Large Language Models: Enhancing LLM Performance with Semantic Hints

Add code
Sep 22, 2024
Viaarxiv icon

UltraEdit: Instruction-based Fine-Grained Image Editing at Scale

Add code
Jul 07, 2024
Viaarxiv icon

MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation

Add code
Jun 29, 2024
Figure 1 for MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation
Figure 2 for MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation
Figure 3 for MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation
Figure 4 for MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation
Viaarxiv icon

Mitigating Language-Level Performance Disparity in mPLMs via Teacher Language Selection and Cross-lingual Self-Distillation

Add code
Apr 12, 2024
Viaarxiv icon

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Add code
Mar 11, 2024
Viaarxiv icon

PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain

Add code
Feb 21, 2024
Viaarxiv icon

ML-Bench: Large Language Models Leverage Open-source Libraries for Machine Learning Tasks

Add code
Nov 16, 2023
Viaarxiv icon