[Part 10] - Strawberry (AI) jam?
Stop counting r's. A sustainability initiative.
Shreyas Subramanian
Amazon Employee
Published Sep 16, 2024
In the last one week alone, the sheer number of people trying out "How many r's are in the word strawberry" has skyrocketed by over 90000% (needs to be fact-checked). The number of articles trying to explain this has also proportionately skyrocketed, so I'll keep this short.
- It's a surprise it even works sometimes
- You probably already know,
text > tokens in > LLM > tokens out
in a recursive way as new tokens are read. Okay, "conditional probability of the next token given previous tokens blah blah...". You get it. - The
LLM
weights are static. Without access to external tools, and with all hyperparameters and the seed held constant, theLLM
will output the nexttoken
with the highest probability. - For whatever reason (
tokenizer
,decoding strategy
,seed
,weights
) this comes out wrong for even SOTA models. - ... This is also the reason why an LLM can't multiply beyond three digit numbers. why should it? https://github.com/mrconter1/The-Long-Multiplication-Benchmark
- ... or why LLMs can't reason. https://arxiv.org/abs/2308.03762 "ocassional flashes of analytical brilliance" is correct.
- ... or why LLMs can't be factual (hallucinates). https://arxiv.org/pdf/2409.05746 . Everything or nothing is hallucination.
- ... and a host of other things
But one of the things it CAN do well is iterate on and write functional code. I am not a front end designer, but this react app that Claude on Bedrock helped me write looks great!
Try it out here:
https://922m7w.csb.app/
Also, if you haven't already, take a look at this article on tokenizers - https://community.aws/content/2ee0thtnVxZmFvpDUZFSck2ixOM/genai-under-the-hood-part-1---tokenizers-and-why-you-should-care
So, what I'm trying to tell you trolls is ... there are some things that LLMs are good at (mainly due to high quality training data in that domain), e.g. code generation, or structured output and function calling. Try the following:
- Create or select a tool (like a calculator) if one exists for your task, and use an LLM with the right tool for the right task.
- If a tool does not exist, benchmark your LLM at this specific task. Act surprised when it fails
- Generate and curate high quality data, and fine tune an adapter or two. Or as many as you like. Celebrate your win and eat some st'r'awbe'r'r'y sho'r'tcake a'r'r'r' ya matey!
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.