Amazon Nova AI Models And New ‘Killer Feature’ In Bedrock Are Huge AWS Partner Opportunities
Amazon’s new Nova foundational models and Amazon Bedrock latency-optimized inference feature represent massive opportunities for channel partners. Here’s why.
Amazon’s new Nova family of foundational models and new latency-optimized inference on Amazon Bedrock are home runs for Amazon Web Services partners.
“We’re really excited about the Nova models because there’s a massive opportunity there,” said Randall Hunt, chief technology officer at AWS Premier Tier Services Partner Caylent. “Think about the applications this can be used for in security cameras, for retail, for sports, for just any sort of generative search that you want to perform over video footage or if you want to generate product videos.”
“It was possible before, but it was not enabled seamlessly through an API and now it’s one API call,” said Hunt, whose company’s AWS business soared a whopping 80 percent in 2024. “So I’m not having to go and spin up a bunch of servers and deal with dynamic frame sampling and all these other complexities.”
[Related: AWS CEO re:Invent Keynote: 7 Bold Google, Windows, VMware And ‘Game-Changing’ Innovation Remarks]
Launched at AWS re:Invent 2024 in Las Vegas this week, Amazon Nova foundational models (FMs) are a new generation of AI models aimed at delivering intelligence and industry leading price performance on Amazon Bedrock.
Nova’s mission is to lower costs and latency for any generative AI task with customers able to build on Nova FMs to analyze documents and videos, understand charts and diagrams, generate video content, and build sophisticated AI agents.
New ‘Killer Feature’: Latency-Optimized Inference For Bedrock
Caylent was one of the AWS launch partners for Nova and had early access to the foundational models while collaborating tightly with AWS on the development and price performance. “I’m very impressed with where they arrived at,” said Hunt (pictured above).
One new feature that will help drive AI and GenAI adoption for Amazon Bedrock and new Nova models is AWS’ new latency-optimized inference for FMs in Amazon Bedrock. This new latency-optimized inference unveiled at AWS re:Invent this week aims to deliver faster response times and improved responsiveness for AI applications.
Currently, Bedrock’s new inference options support Anthropic's Claude 3.5 Haiku model and Meta's Llama 3.1 405B and 70B models to provide reduced latency compared to standard models without compromising accuracy.
“The number of people can take advantage of it in a shared tenancy way, which is what the new Bedrock latency optimized inference allows—that is just going to be a killer feature,” said Hunt. “The price point, it comes at a 25 percent price premium to the other Bedrock options. So you get Anthropic 3.5 Haiku and Llama 3.1405b at a 25 percent price premium, but hundreds of tokens per second faster.”
Irvine, Calif.-based Caylent is one of AWS’ most innovative partners in the world. This year, the company won AWS Migration Partner of the Year award, AWS’ GenAI Industry Solution Partner of the Year award, and Industry Partner of the Year award for Financial Services for North America.
This week, Caylent launched its own Applied Intelligence delivery model dedicated to AI-first cloud services. It is designed to lower the barriers of cloud migration and modernization through the strategic and intentional application of AI at every stage of a customer’s cloud evolution.
Caylent’s CTO said at the end of the day “everyone is just trying to get the highest quality tokens, lowest possible cost” as quickly as possible. “Amazon is enabling them with this new investment in Trainium and Bedrock latency optimized inference,” Hunt said. “We’re super excited about those.”
AWS said both Anthropic Claude 3.5 Haiku and Llama models run faster on AWS compared to any other major cloud provider.
Amazon’s Three Nova Models: Micro, Lite And Pro
The Amazon Nova models include three understanding models designed to meet customers’ different needs: Micro, Lite And Pro.
Amazon Nova Micro is a text-only model that delivers the lowest latency responses in the Amazon Nova family of models. With a context length of 128K tokens and optimized for speed and cost, Nova Micro excels at tasks such as text summarization, translation, content classification, interactive chat and brainstorming, and simple mathematical reasoning and coding.
Next up is Amazon Nova Lite, a very low-cost multimodal model targeting processing image, video, and text inputs to generate text output. Nova Lite can handle real-time customer interactions, document analysis, and visual question-answering tasks with high accuracy. The model processes inputs up to 300,000 tokens in length and can analyze multiple images or up to 30 minutes of video in a single request.
Lastly, Amazon Nova Pro is a multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks I the Nova family. Nova Pro is capable of processing up to 300,000 input tokens and process code bases with over 15,000 lines of code. AWS said Pro sets new standards in multimodal intelligence and agentic workflows that require calling APIs and tools to complete complex workflows.
Deloitte Consulting’s leading AWS executive, JB McGinnis, said Amazon Nova highlights AWS’ commitment to delivering flexibility and choice in AI.
“By offering a diverse suite of foundation models, AWS empowers organizations to select the tools best suited for their specific workloads and industries,” said McGinnis. “This approach realizes that no two businesses are the same, and ensures that companies can tailor AI solutions to their unique needs.”
He said Nova shows the shift towards adaptable AI systems to give enterprises the ability to innovate with precision and confidence while maintaining control over costs and scalability. “It’s a critical step in enabling broader adoption across industries,” McGinnis said.