Picture for Hangyu Guo

Hangyu Guo

WiS Platform: Enhancing Evaluation of LLM-Based Multi-Agent Systems Through Game-Based Analysis

Add code
Dec 04, 2024
Figure 1 for WiS Platform: Enhancing Evaluation of LLM-Based Multi-Agent Systems Through Game-Based Analysis
Figure 2 for WiS Platform: Enhancing Evaluation of LLM-Based Multi-Agent Systems Through Game-Based Analysis
Figure 3 for WiS Platform: Enhancing Evaluation of LLM-Based Multi-Agent Systems Through Game-Based Analysis
Figure 4 for WiS Platform: Enhancing Evaluation of LLM-Based Multi-Agent Systems Through Game-Based Analysis
Viaarxiv icon

PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos

Add code
Dec 02, 2024
Viaarxiv icon

Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models

Add code
Nov 13, 2024
Figure 1 for Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models
Figure 2 for Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models
Figure 3 for Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models
Figure 4 for Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models
Viaarxiv icon

2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision

Add code
Oct 25, 2024
Figure 1 for 2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision
Figure 2 for 2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision
Figure 3 for 2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision
Figure 4 for 2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision
Viaarxiv icon

MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models

Add code
Oct 15, 2024
Figure 1 for MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models
Figure 2 for MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models
Figure 3 for MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models
Figure 4 for MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models
Viaarxiv icon

ING-VP: MLLMs cannot Play Easy Vision-based Games Yet

Add code
Oct 09, 2024
Figure 1 for ING-VP: MLLMs cannot Play Easy Vision-based Games Yet
Figure 2 for ING-VP: MLLMs cannot Play Easy Vision-based Games Yet
Figure 3 for ING-VP: MLLMs cannot Play Easy Vision-based Games Yet
Figure 4 for ING-VP: MLLMs cannot Play Easy Vision-based Games Yet
Viaarxiv icon

GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models

Add code
Jun 20, 2024
Viaarxiv icon

GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation

Add code
Jun 17, 2024
Figure 1 for GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation
Figure 2 for GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation
Figure 3 for GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation
Figure 4 for GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation
Viaarxiv icon

Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models

Add code
Mar 14, 2024
Viaarxiv icon

What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning

Add code
Nov 02, 2023
Viaarxiv icon