AWS | Community | Defining best for long context Language Models | S02 E02

Image not found

Two bald men talk about Large Language Models

Figuring out which LLM is best, is hard. And can be a very important decision to make when you are choosing the correct one for your workload. Lucky for us (and you), therea re tools to help us better evaluate the plethora of models that are out there. Today on Build On Generative AI, Darko i joined by David as he goes over L-Eval and how this tool is used for evaluating models.

A few things of note fromt today - there are two different types of tasks a model may do:

Close-Ended tasks, tasks that should produce a fact, something that is already known and can be tested against. Here we use something called rigid evaluation
Open-Ended tasks, tasks that are more creative in nature. For example text summarization. Something that will require the use of n-gram evaluation.

Check out the recording here:

To view this Twitch stream, please accept cookies.

Links from today's episode

L-Eval paper

Reach out to the hosts and guests:

David: https://www.linkedin.com/in/davidbbounds/
Darko: https://twitter.com/darkosubotica

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Select your cookie preferences

Site Terms, Privacy, and more.

Defining best for long context Language Models | S02 E02 | Build On Generative AI

Do you know what is the best long context Large Language Model? 🤔 Well, it's time to evaluate them. David joins us today talk about L-eval

Links from today's episode

Comments