Defining best for long context Language Models | S02 E02 | Build On Generative AI
Do you know what is the best long context Large Language Model? 🤔 Well, it's time to evaluate them. David joins us today talk about L-eval
Figuring out which LLM is best, is hard. And can be a very important decision to make when you are choosing the correct one for your workload. Lucky for us (and you), therea re tools to help us better evaluate the plethora of models that are out there. Today on Build On Generative AI, Darko i joined by David as he goes over L-Eval and how this tool is used for evaluating models.
A few things of note fromt today - there are two different types of tasks a model may do:
- Close-Ended tasks, tasks that should produce a fact, something that is already known and can be tested against. Here we use something called rigid evaluation
- Open-Ended tasks, tasks that are more creative in nature. For example text summarization. Something that will require the use of n-gram evaluation.
Check out the recording here:
Reach out to the hosts and guests: