Defining best for long context Language Models | S02 E02 | Build On Generative AI
Do you know what is the best long context Large Language Model? 🤔 Well, it's time to evaluate them. David joins us today talk about L-eval
- Close-Ended tasks, tasks that should produce a fact, something that is already known and can be tested against. Here we use something called rigid evaluation
- Open-Ended tasks, tasks that are more creative in nature. For example text summarization. Something that will require the use of n-gram evaluation.
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.