Published Aug 21, 2023

Figuring out which LLM is best, is hard. And can be a very important decision to make when you are choosing the correct one for your workload. Lucky for us (and you), therea re tools to help us better evaluate the plethora of models that are out there. Today on Build On Generative AI, Darko i joined by David as he goes over L-Eval and how this tool is used for evaluating models.

A few things of note fromt today - there are two different types of tasks a model may do:

  • Close-Ended tasks, tasks that should produce a fact, something that is already known and can be tested against. Here we use something called rigid evaluation
  • Open-Ended tasks, tasks that are more creative in nature. For example text summarization. Something that will require the use of n-gram evaluation.

