Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Paul Lintilhac

TEL'M: Test and Evaluation of Language Models

Apr 16, 2024

George Cybenko, Joshua Ackerman, Paul Lintilhac

Figure 1 for TEL'M: Test and Evaluation of Language Models

Figure 2 for TEL'M: Test and Evaluation of Language Models

Figure 3 for TEL'M: Test and Evaluation of Language Models

Figure 4 for TEL'M: Test and Evaluation of Language Models

Abstract:Language Models have demonstrated remarkable capabilities on some tasks while failing dramatically on others. The situation has generated considerable interest in understanding and comparing the capabilities of various Language Models (LMs) but those efforts have been largely ad hoc with results that are often little more than anecdotal. This is in stark contrast with testing and evaluation processes used in healthcare, radar signal processing, and other defense areas. In this paper, we describe Test and Evaluation of Language Models (TEL'M) as a principled approach for assessing the value of current and future LMs focused on high-value commercial, government and national security applications. We believe that this methodology could be applied to other Artificial Intelligence (AI) technologies as part of the larger goal of "industrializing" AI.

Via

Access Paper or Ask Questions