Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Aman Malhotra

Evaluating Large Language Models with fmeval

Jul 15, 2024

Pola Schwöbel, Luca Franceschi, Muhammad Bilal Zafar, Keerthan Vasist, Aman Malhotra, Tomer Shenhar, Pinal Tailor, Pinar Yilmaz, Michael Diamond, Michele Donini

Figure 1 for Evaluating Large Language Models with fmeval

Figure 2 for Evaluating Large Language Models with fmeval

Figure 3 for Evaluating Large Language Models with fmeval

Figure 4 for Evaluating Large Language Models with fmeval

Abstract:fmeval is an open source library to evaluate large language models (LLMs) in a range of tasks. It helps practitioners evaluate their model for task performance and along multiple responsible AI dimensions. This paper presents the library and exposes its underlying design principles: simplicity, coverage, extensibility and performance. We then present how these were implemented in the scientific and engineering choices taken when developing fmeval. A case study demonstrates a typical use case for the library: picking a suitable model for a question answering task. We close by discussing limitations and further work in the development of the library. fmeval can be found at https://github.com/aws/fmeval.

Via

Access Paper or Ask Questions

Summary Paper: Use Case on Building Collaborative Safe Autonomous Systems-A Robotdog for Guiding Visually Impaired People

Mar 02, 2024

Aman Malhotra, Selma Saidi

Figure 1 for Summary Paper: Use Case on Building Collaborative Safe Autonomous Systems-A Robotdog for Guiding Visually Impaired People

Abstract:This is a summary paper of a use case of a Robotdog dedicated to guide visually impaired people in complex environment like a smart intersection. In such scenarios, the Robotdog has to autonomously decide whether it is safe to cross the intersection or not in order to further guide the human. We leverage data sharing and collaboration between the Robotdog and other autonomous systems operating in the same environment. We propose a system architecture for autonomous systems through a separation of a collaborative decision layer, to enable collective decision making processes, where data about the environment, relevant to the Robotdog decision, together with evidences for trustworthiness about other systems and the environment are shared.

Via

Access Paper or Ask Questions