AWS Logo
Menu

First Look at Amazon Nova

I've adapted my Japanese Language Learning Assistant to use Amazon Nova Pro.

Published Dec 13, 2024

日本語を勉強します (I am Studying Japanese Language)

I have Japanese Language lessons every week with my Japanese Teacher from Japan.
I wanted an interactive way to test my skills in-between classes by building a series of small learning apps to act as a Teaching Assistant.
To be honest I would be much farther along in my language learning if I focused more time on studying than building apps. 🙃

Benchmark App

One of these apps is the Japanese Sentence Constructor. I provide it an english sentence and it guides me through translation through clues until I arrive at the correct answer.
This specific app has been what I use to evaluate new LLMs because I've invested significant time iterating on the prompt document and observing its output and the expected behaviour I want.
I have variants of the prompt document for:
  • Cohere Command R Plus
  • Anthropic Claude 3.5
  • OpenAI GPT 4
  • Google Gemma 2
I had previous attempts with Amazon Titan's models, but at the time those models did not perform well enough for productive Japanese lanaguage study.

First Look at Amazon Bedrock Nova

Japanese Sentence Constructor is currently in a private repo but I decided to create a simplified variant of that codebase so we could evaluate Amazon Bedrock. The repo being here.
I have yet to write a custom prompt document for Amazon Nova models, but I wanted to see how Amazon Nova Pro and Amazon Nova Lite performed.
I've also recorded a 45 mins video showing how the app works, and you can see me evaluating the Amazon Nova Pro and Amazon Nova Lite.
First Look At Amazon Nova

Lets jump into some observations about Amazon Nova's model.

Code Generation

Since these models are newly released I was hoping out of the box they would have more up-to-date knowledge on AWS APIs.
So I asked Amazon Nova Pro to generate out a simple GenAI chat app using Streamlit with Amazon Bedrock's Converse API. Instead I got back InvokeModel. 😑
Converse API has been out for at least 6 months, there's certainly is content kicking around on how to use it. Converse API is the preferred way to use Amazon Bedrock when using multi-turn conversations.
Models are very expensive and time-consuming to train, so it makes me think that Nova's dataset is dated, or the timing of training was much earlier in the year.
I have not observed any other LLMs able to generate the expected result to use Amazon Bedrock Converse API (as part of their base knowledge), so at least there is no competing model that I know of that's outperforming on AWS tasks.
Amazon Q Developer will arrive at the correct implementation but I suspect its augmenting its knowledge before returning the result. Amazon Q Developer will also correctly use to Amazon Nova Pro (when prompted to do so).

Model's Performance Nova Pro vs Nova Lite

AWS should no longer feel embarrassed because these models perform very well.
How well? I'd put Nova Pro performance and price wise to be eerily similar with Anthropic Claude Haiku 3.5
I would imagine that when Nova Premiere releases it could be comparable to Anthropic Claude Sonnet 3.5.
I'm not sure where to put Nova Lite, as it performed better than Cohere's Command R model but it did not follow my prompt document, and its outputs had glaring repeated mistakes.
At this time I have not invested in creating an evaluation dataset because its very time consuming so this is just limited human feedback and we are only focused on the text modality.

Why Should You Use Amazon Nova Models?

AWS Nova models appears to be reaching parity with popular closed third-party models, albeit lacking any exciting API features. You can even say Amazon Nova is quite boring.
However, I think customers would ditch the bells and whistle if the trade is a reliable Serverless API.
OpenAI and Anthropic API's uptime has been turbulent as of recent and I would rather have excellent Customer Support and guaranteed uptime than an API endpoint to perform computer remote control via an LLM.
Uptime is pretty cool.

 

Comments