Picture for Songyang Zhang

Songyang Zhang

CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution

Add code
Oct 21, 2024
Viaarxiv icon

ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs

Add code
Oct 16, 2024
Viaarxiv icon

HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models

Add code
Sep 24, 2024
Viaarxiv icon

UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios

Add code
Aug 30, 2024
Viaarxiv icon

LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models

Add code
Jul 22, 2024
Viaarxiv icon

NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?

Add code
Jul 16, 2024
Viaarxiv icon

CIBench: Evaluating Your LLMs with a Code Interpreter Plugin

Add code
Jul 15, 2024
Viaarxiv icon

GTA: A Benchmark for General Tool Agents

Add code
Jul 11, 2024
Viaarxiv icon

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Add code
Jul 03, 2024
Figure 1 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Figure 2 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Figure 3 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Figure 4 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Viaarxiv icon

InternLM-Law: An Open Source Chinese Legal Large Language Model

Add code
Jun 21, 2024
Viaarxiv icon