Picture for Fangyu Lei

Fangyu Lei

Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows

Add code
Nov 12, 2024
Figure 1 for Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Figure 2 for Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Figure 3 for Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Figure 4 for Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Viaarxiv icon

DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models

Add code
Oct 09, 2024
Figure 1 for DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models
Figure 2 for DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models
Figure 3 for DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models
Figure 4 for DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models
Viaarxiv icon

Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?

Add code
Jul 15, 2024
Figure 1 for Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
Figure 2 for Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
Figure 3 for Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
Figure 4 for Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
Viaarxiv icon

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Add code
Apr 11, 2024
Figure 1 for OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Figure 2 for OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Figure 3 for OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Figure 4 for OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Viaarxiv icon

Neeko: Leveraging Dynamic LoRA for Efficient Multi-Character Role-Playing Agent

Add code
Mar 01, 2024
Figure 1 for Neeko: Leveraging Dynamic LoRA for Efficient Multi-Character Role-Playing Agent
Figure 2 for Neeko: Leveraging Dynamic LoRA for Efficient Multi-Character Role-Playing Agent
Figure 3 for Neeko: Leveraging Dynamic LoRA for Efficient Multi-Character Role-Playing Agent
Figure 4 for Neeko: Leveraging Dynamic LoRA for Efficient Multi-Character Role-Playing Agent
Viaarxiv icon

MoELoRA: Contrastive Learning Guided Mixture of Experts on Parameter-Efficient Fine-Tuning for Large Language Models

Add code
Feb 20, 2024
Viaarxiv icon

Competition-Level Problems are Effective LLM Evaluators

Add code
Dec 05, 2023
Viaarxiv icon

Assessing Knowledge Editing in Language Models via Relation Perspective

Add code
Nov 15, 2023
Figure 1 for Assessing Knowledge Editing in Language Models via Relation Perspective
Figure 2 for Assessing Knowledge Editing in Language Models via Relation Perspective
Figure 3 for Assessing Knowledge Editing in Language Models via Relation Perspective
Figure 4 for Assessing Knowledge Editing in Language Models via Relation Perspective
Viaarxiv icon

TableQAKit: A Comprehensive and Practical Toolkit for Table-based Question Answering

Add code
Oct 23, 2023
Viaarxiv icon

S3Eval: A Synthetic, Scalable, Systematic Evaluation Suite for Large Language Models

Add code
Oct 23, 2023
Figure 1 for S3Eval: A Synthetic, Scalable, Systematic Evaluation Suite for Large Language Models
Figure 2 for S3Eval: A Synthetic, Scalable, Systematic Evaluation Suite for Large Language Models
Figure 3 for S3Eval: A Synthetic, Scalable, Systematic Evaluation Suite for Large Language Models
Figure 4 for S3Eval: A Synthetic, Scalable, Systematic Evaluation Suite for Large Language Models
Viaarxiv icon