Amazon SageMaker

Amazon SageMaker and Pinecone can be used together for high-performance, scalable, and reliable Retrieval Augmented Generation (RAG) use cases. The integration allows us to use SageMaker compute and model hosting for Large Language Models (LLMs) and Pinecone as the knowledge base that allows us to keep our LLMs up to date with the latest information and reduce the likelihood of hallucinations.

In this example, we'll see how to use SageMaker Jumpstart to deploy high-performance Llama 2 LLMs with RAG in a conversational AI setting.

In this example, we see how to use SageMaker to deploy LLM instances for hosting models like BloomZ 7B1, Flan T5 XL, and Flan T5 UL2 to respond to user questions with insightful answers.