Defining best for long context Language Models | S02 E02 | Build On Generative AI

Do you know what is the best long context Large Language Model? 🤔 Well, it's time to evaluate them. David joins us today talk about L-eval

AWS Admin
Amazon Employee
Published Aug 21, 2023
Last Modified Jun 25, 2024
Screenshot of David and Darko
Two bald men talk about Large Language Models
Figuring out which LLM is best, is hard. And can be a very important decision to make when you are choosing the correct one for your workload. Lucky for us (and you), therea re tools to help us better evaluate the plethora of models that are out there. Today on Build On Generative AI, Darko i joined by David as he goes over L-Eval and how this tool is used for evaluating models.
A few things of note fromt today - there are two different types of tasks a model may do:
  • Close-Ended tasks, tasks that should produce a fact, something that is already known and can be tested against. Here we use something called rigid evaluation
  • Open-Ended tasks, tasks that are more creative in nature. For example text summarization. Something that will require the use of n-gram evaluation.
Check out the recording here:

Links from today's episode

Reach out to the hosts and guests:

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.