Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:LLMeBench: A Flexible Framework for Accelerating LLMs Benchmarking

Aug 09, 2023

Fahim Dalvi, Maram Hasanain, Sabri Boughorbel, Basel Mousi, Samir Abdaljalil, Nizi Nazar, Ahmed Abdelali, Shammur Absar Chowdhury, Hamdy Mubarak, Ahmed Ali(+3 more)

Figure 1 for LLMeBench: A Flexible Framework for Accelerating LLMs Benchmarking

Figure 2 for LLMeBench: A Flexible Framework for Accelerating LLMs Benchmarking

Figure 3 for LLMeBench: A Flexible Framework for Accelerating LLMs Benchmarking

Share this with someone who'll enjoy it:

Abstract:The recent development and success of Large Language Models (LLMs) necessitate an evaluation of their performance across diverse NLP tasks in different languages. Although several frameworks have been developed and made publicly available, their customization capabilities for specific tasks and datasets are often complex for different users. In this study, we introduce the LLMeBench framework. Initially developed to evaluate Arabic NLP tasks using OpenAI's GPT and BLOOM models; it can be seamlessly customized for any NLP task and model, regardless of language. The framework also features zero- and few-shot learning settings. A new custom dataset can be added in less than 10 minutes, and users can use their own model API keys to evaluate the task at hand. The developed framework has been already tested on 31 unique NLP tasks using 53 publicly available datasets within 90 experimental setups, involving approximately 296K data points. We plan to open-source the framework for the community (https://github.com/qcri/LLMeBench/). A video demonstrating the framework is available online (https://youtu.be/FkQn4UjYA0s).

* Foundation Models, Large Language Models, NLP, CHatGPT Evaluation, LLMs Benchmark

View paper on

Share this with someone who'll enjoy it:

Title:LLMeBench: A Flexible Framework for Accelerating LLMs Benchmarking

Paper and Code