## Organization
An organization is a group of one or more [projects](#project) that use the same billing. Organizations allow one or more [users](#user) to control billing and permissions for all of the projects belonging to the organization.
For more information, see [Understanding organizations](/guides/organizations/understanding-organizations).
## Project
A project belongs to an [organization](#organization) and contains one or more [indexes](#index). Each project belongs to exactly one organization, but only [users](#user) who belong to the project can access the indexes in that project. [API keys](#api-key) and [Assistants](#assistant) are project-specific.
For more information, see [Understanding projects](/guides/projects/understanding-projects).
## Index
There are two types of [serverless indexes](/guides/index-data/indexing-overview), dense and sparse.
### Dense index
Dense indexes store records that have one [dense vector](#dense-vector) each.
For more information, see [Use namespaces](/guides/index-data/indexing-overview#namespaces).
## Record
A record is a basic unit of data and consists of a [record ID](#record-id), a [dense vector](#dense-vector) or a [sparse vector](#sparse-vector) (depending on the type of index), and optional [metadata](#metadata).
For more information, see [Upsert data](/guides/index-data/upsert-data).
### Record ID
A record ID is a record's unique ID. [Use ID prefixes](/guides/index-data/data-modeling#use-structured-ids) that reflect the type of data you're storing.
### Dense vector
A dense vector, also referred to as a vector embedding or simply a vector, is a series of numbers that represent the meaning and relationships of data. Each number in a dense vector corresponds to a point in a multidimensional space. Vectors that are closer together in that space are semantically similar.
Dense vectors are stored in [dense indexes](#dense-index).
You use a dense embedding model to convert data to dense vectors. The embedding model can be external to Pinecone or [hosted on Pinecone infrastructure](/guides/index-data/create-an-index#embedding-models) and integrated with an index.
For more information about dense vectors, see [What are vector embeddings?](https://www.pinecone.io/learn/vector-embeddings/).
### Sparse vector
Sparse vectors are often used to represent documents or queries in a way that captures keyword information. Each dimension in a sparse vector typically represents a word from a dictionary, and the non-zero values represent the importance of these words in the document.
Sparse vectors have a large number of dimensions, but a small number of those values are non-zero. Because most values are zero, Pinecone stores sparse vectors efficiently by keeping only the non-zero values along with their corresponding indices.
Sparse vectors are stored in [sparse indexes](#sparse-index) and [hybrid indexes](/guides/search/hybrid-search#use-a-single-hybrid-index). To convert data to sparse vectors, use a sparse embedding model. The embedding model can be external to Pinecone or [hosted on Pinecone infrastructure](/guides/index-data/create-an-index#embedding-models) and integrated with an index.
For more information about sparse vectors, see [Sparse retrieval](https://www.pinecone.io/learn/sparse-retrieval/).
### Metadata
Metadata is additional information included in a record to provide more context and enable additional [filtering capabilities](/guides/index-data/indexing-overview#metadata). For example, the original text that was embedded can be stored in the metadata.
## Other concepts
Although not represented in the diagram above, Pinecone also contains the following concepts:
* [API key](#api-key)
* [User](#user)
* [Backup or collection](#backup-or-collection)
* [Pinecone Inference](#pinecone-inference)
### API key
An API key is a unique token that [authenticates](/reference/api/authentication) and authorizes access to the [Pinecone APIs](/reference/api/introduction). API keys are project-specific.
### User
A user is a member of organizations and projects. Users are assigned specific roles at the organization and project levels that determine the user's permissions in the [Pinecone console](https://app.pinecone.io).
For more information, see [Manage organization members](/guides/organizations/manage-organization-members) and [Manage project members](/guides/projects/manage-project-members).
### Backup or collection
A backup is a static copy of a serverless index.
Backups only consume storage. They are non-queryable representations of a set of records. You can create a backup from an index, and you can create a new index from that backup. The new index configuration can differ from the original source index: for example, it can have a different name. However, it must have the same number of dimensions and similarity metric as the source index.
For more information, see [Understanding backups](/guides/manage-data/backups-overview).
### Pinecone Inference
Pinecone Inference is an API service that provides access to [embedding models](/guides/index-data/create-an-index#embedding-models) and [reranking models](/guides/search/rerank-results#reranking-models) hosted on Pinecone's infrastructure.
## Learn more
* [Vector database](https://www.pinecone.io/learn/vector-database/)
* [Pinecone APIs](/reference/api/introduction)
* [Approximate nearest neighbor (ANN) algorithms](https://www.pinecone.io/learn/a-developers-guide-to-ann-algorithms/)
* [Retrieval augmented generation (RAG)](https://www.pinecone.io/learn/retrieval-augmented-generation/)
* [Image search](https://www.pinecone.io/learn/series/image-search/)
* [Tokenization](https://www.pinecone.io/learn/tokenization/)
# Architecture
Source: https://docs.pinecone.io/guides/get-started/database-architecture
Learn how Pinecone's architecture enables fast, relevant vector search at any scale.
## Overview
Pinecone runs as a managed service on AWS, GCP, and Azure cloud platforms. When you send a request to Pinecone, it goes through an [API gateway](#api-gateway) that routes it to either a global [control plane](#control-plane) or a regional [data plane](#data-plane). All your vector data is stored in highly efficient, distributed [object storage](#object-storage).
## On-demand vs dedicated
On-demand indexes and dedicated read nodes are both built on Pinecone's serverless infrastructure. They use the same write path, storage layer, and data operations API.
However, every dedicated read nodes index has isolated hardware for read operations (query, fetch, list), allowing these operations to run on dedicated query executors. This affects performance, cost, and how you scale:
| Feature | On-demand | Dedicated read nodes |
| :---------------------- | :--------------------------------------------------------------------------------------------------------------------------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Read infrastructure** | Multi-tenant compute resources shared across customers | Isolated, provisioned query executors dedicated to your index |
| **Read costs** | Pay per [read unit](/guides/manage-cost/understanding-cost#serverless-indexes) (1 RU per 1 GB of namespace size per query, minimum 0.25 RU) | Fixed hourly rate for read capacity based on node type, shards, and replicas |
| **Other costs** | [Storage](/guides/manage-cost/understanding-cost#storage) and [write](/guides/manage-cost/understanding-cost#write-units) costs based on usage | [Storage](/guides/manage-cost/understanding-cost#storage) and [write](/guides/manage-cost/understanding-cost#write-units) costs based on usage (same as on-demand) |
| **Caching** | Best-effort; frequently accessed data is cached, but cold queries fetch from object storage | Guaranteed; all index data always warm in memory and on local SSDs |
| **Read rate limits** | [2,000 RUs/second per index (adjustable)](/reference/api/database-limits#rate-limits) | No read rate limits (only bounded by CPU capacity) |
| **Scaling** | Automatic; Pinecone handles capacity | Manual; add [shards](#shards) for storage, add [replicas](#replicas) for throughput |
| **Best for** | Variable workloads, multi-tenant applications with many namespaces, low to moderate query rates | Sustained high query rates, large single-namespace workloads, predictable performance requirements |
## When to use dedicated read nodes
Dedicated read nodes are ideal for workloads with millions to billions of records and predictable query rates. They provide performance and cost benefits compared to on-demand for high-throughput workloads, and may be required when your workload exceeds on-demand rate limits.
There's no universal formula for choosing between on-demand and dedicated read nodes—performance and cost vary by workload (vector dimensionality, metadata filtering, and query patterns). Consider the following factors when making your decision:
In Pinecone, an [index](/guides/index-data/indexing-overview) is the highest-level organizational unit of data, where you define the dimension of vectors to be stored in the index and the measure of similarity to be used when querying the index.
Within an index, records are stored in [namespaces](/guides/index-data/indexing-overview#namespaces), and all [upserts](/guides/index-data/upsert-data), [queries](/guides/search/search-overview), and other [data plane operations](/reference/api/latest/data-plane) always target one namespace.
This structure makes it easy to implement multitenancy. For example, for an AI-powered SaaS application where you need to isolate the data of each customer, you would assign each customer to a namespace and target their writes and queries to that namespace (diagram above).
In cases where you have different workload patterns (e.g., RAG and semantic search), you would use a different index for each workload, with one namespace per customer in each index:
## Vector embedding
[Dense vectors](/guides/get-started/concepts#dense-vector) and [sparse vectors](/guides/get-started/concepts#sparse-vector) are the basic units of data in Pinecone and what Pinecone was specially designed to store and work with. Dense vectors represents the semantics of data such as text, images, and audio recordings, while sparse vectors represent documents or queries in a way that captures keyword information.
To transform data into vector format, you use an embedding model. You can either use Pinecone's integrated embedding models to convert your source data to vectors automatically, or you can use an external embedding model and bring your own vectors to Pinecone.
### Integrated embedding
1. [Create an index](/guides/index-data/create-an-index) that is integrated with one of Pinecone's [hosted embedding models](/guides/index-data/create-an-index#embedding-models).
2. [Upsert](/guides/index-data/upsert-data) your source text. Pinecone uses the integrated model to convert the text to vectors automatically.
3. [Search](/guides/search/search-overview) with a query text. Again, Pinecone uses the integrated model to convert the text to a vector automatically.
When the new serverless index is ready, the status will change to green:
The BYOC architecture employs a split model:
* **Data plane**: The data plane is responsible for storing and processing your records, executing queries, and interacting with object storage for index data. In a BYOC deployment, the data plane is hosted in your own AWS or GCP account within a dedicated VPC, ensuring that all data is stored and processed locally and does not leave your organizational boundaries. You use a [private endpoint](#configure-a-private-endpoint) (AWS PrivateLink or GCP Private Service Connect) as an additional measure to secure requests to your indexes.
* **Control plane**: The control plane is responsible for managing the index lifecycle as well as region-agnostic services such as user management, authentication, and billing. The control plane does not hold or process any records. In a BYOC deployment, the control plane is managed by Pinecone and hosted globally. Communication between the data plane and control plane is encrypted using TLS and employs role-based access control (RBAC) with minimal IAM permissions.
## Onboarding
The onboarding process for BYOC in AWS or GCP involves the following general stages:
Traffic is also encrypted in transit between the Pinecone backend and cloud infrastructure services, such as S3 and GCS. For more information, see [Google Cloud Platform](https://cloud.google.com/docs/security/encryption-in-transit) and [AWS security documentation](https://docs.aws.amazon.com/AmazonS3/userguide/UsingEncryption.html).
## Network security
### Private Endpoints for AWS PrivateLink
Use [Private Endpoints to connect to Amazon Web Services (AWS) PrivateLink](/guides/production/connect-to-aws-privatelink). This establishes private connectivity between your Pinecone serverless indexes and supported AWS services while keeping your VPC private from the public internet.
Private Endpoints are additive to other Pinecone security features: data is also [encrypted in transit](#encryption-in-transit), [encrypted at rest](#encryption-at-rest), and an [API key](#api-keys) is required to authenticate.
### Proxies
The following Pinecone SDKs support the use of proxies:
* [Python SDK](/reference/python-sdk#proxy-configuration)
* [Node.js SDK](/reference/node-sdk#proxy-configuration)
# Create a project
Source: https://docs.pinecone.io/guides/projects/create-a-project
Create a new Pinecone project in your organization.
This page shows you how to create a project.
If you are an [organization owner or user](/guides/organizations/understanding-organizations#organization-roles), you can create a project in your organization:
## Specify an API version
## Specify an API version
## Specify an API version
{text}
; }; export const ExampleCard = ({title, text, link, children, arrow, vectors, namespaces}) => { return{text}
{children &&{text}
; }; export const ExampleCard = ({title, text, link, children, arrow, vectors, namespaces}) => { return{text}
{children &&
## Setup guide
The process of using a Bedrock knowledge base with Pinecone works as follows:
3. Click **Next**.
4. Enter a **Secret name** and **Description**.
5. Click **Next** to save your key.
6. On the **Configure rotation** page, select all the default options in the next screen, and click **Next**.
7. Click **Store**.
8. Click on the new secret you created and save the secret ARN for a later step.
#### Set up S3
The knowledge base is going to draw on data saved in S3. For this example, we use a [sample of research papers](https://huggingface.co/datasets/jamescalam/ai-arxiv2-semantic-chunks) obtained from a dataset. This data will be embedded and then saved in Pinecone.
1. Create a new general purpose bucket in [Amazon S3](https://console.aws.amazon.com/s3/home).
2. Once the bucket is created, upload a CSV file.
By inspecting the trace, we can see what chunks were used by the Agent and diagnose issues with responses.
## Related articles
* [Pinecone as a Knowledge Base for Amazon Bedrock](https://www.pinecone.io/blog/amazon-bedrock-integration/)
# Amazon SageMaker
Source: https://docs.pinecone.io/integrations/amazon-sagemaker
export const PrimarySecondaryCTA = ({primaryLabel, primaryHref, primaryTarget, secondaryLabel, secondaryHref, secondaryTarget}) =>
Create intelligent chatbots, generate content, build AI forms, and automate tasks — all from your WordPress dashboard.
Seamlessly integrate, transform, and load data into Pinecone from hundreds of systems, including databases, data warehouses, and SaasS products.
Integrate your enterprise data into Amazon Bedrock using Pinecone to build highly performant GenAI applications.
Integrate machine learning models seamlessly with a fully-managed service that enables easy deployment and scalability.
Focus on building applications powered by LLMs without the need to worry about the underlying infrastructure.
Integrate results from web scrapers or crawlers into a vector database to support RAG or semantic search over web content.
Process complex, unstructured documents with a purpose-built ETL system for RAG and GenAI applications.
Access Pinecone through our AWS Marketplace listing.
Connect a Box account to a Pinecone vector database.
Vector embedding, RAG, and semantic search at scale.
Leverage cutting-edge natural language processing tools for enhanced text understanding and generation in your applications.
Connect and process all of your data in real time with a cloud-native and complete data streaming platform.
Create end-to-end data flows that connect data sources to Pinecone.
Combine the power of a unified analytics platform with Pinecone for scalable data processing and AI insights.
Monitor and secure your applications by integrating with a cloud-scale monitoring service that provides real-time analytics.
Source, transform, and enrich data in a continuous, composable and customizable manner.
Source data from hundreds systems and push data to Pinecone, for an always up-to-date view.
Build customized LLM apps with an open source, low-code tool for developing orchestration flow & AI agents.
Build, deploy, and manage complex workflows with a low-code platform for AI-assisted ML and LLM transformations.
Build and operationalize data and AI-driven solutions at scale.
Access Pinecone through our Google Cloud Marketplace listing.
Build AI powered applications and agents.
Get personalized recommendations that enable you to retrieve relevant data and collaborate effectively with Copilot.
Implement an end-to-end search pipeline for efficient retrieval and question answering over large datasets.
Clearly visualize your execution traces and spans.
Deploy state-of-the-art machine learning models on scalable infrastructure, streamlining the path from prototype to production.
Streamline AI development with a low-code full-stack infrastructure tool for data, model, and pipeline orchestration.
Leverage powerful AI models to generate high-quality text embeddings, fine-tuned to both domain- and language-specific use cases.
Combine language models with chain-of-thought reasoning for advanced problem solving and decision support.
Access rich and high cardinal tracing for Pinecone API calls, ingestible into your observability tool of choice.
Leverage Llama for indexing and retrieving information at scale, improving data access and analysis.
Easily create and maintain data pipelines, build custom connectors for any source, and enjoy AI and high-code options to suit any need.
Access Pinecone through our Microsoft Marketplace listing.
Implement monitoring and integrate your Pinecone application with New Relic for performance analysis and insights.
Ingest data from 500+ connectors with Nexla's low-code/no-code AI integration platform.
Nuclia RAG-as-a-Service automatically indexes files and documents from both internal and external sources.
Harness value from the latest AI innovations by delievering efficient, reliable, and customizable AI systems for your apps.
Access powerful AI models like GPT for innovative applications and services, enhancing user experiences with AI capabilities.
Manage your Pinecone collections and indexes using any language of Pulumi Infrastructure as Code.
Connect existing data sources to Pinecone with a Kafka-compatible streaming data platform built for data-intensive applications.
Run Pinecone with Snowpark Container Services, designed to deploy, manage, and scale containerized applications within the Snowflake ecosystem.
A scalable, resilient, and secure messaging and event streaming platform.
Manage your infrastructure using configuration files for a consistent workflow.
Produce traces and metrics that can be viewed in any OpenTelemetry-based platform.
Gain insights into your machine learning models' decisions, improving interpretability and trustworthiness.
Create high-quality multimodal embeddings that capture the rich context and interactions between different modalities in videos.
Load data into Pinecone with a single click.
Use Pinecone as the long-term memory for your Vercel AI projects, and easily scale to support billions of data points.
A TypeScript-based, AI-agent framework for building AI applications with retrieval-augmented generation (RAG) capabilities.
Cutting-edge embedding models and rerankers for semantic search and RAG.
Zapier connects Pinecone to thousands of apps to help you automate your work. No code required.
{company.toUpperCase()}
{description}
Task
{task}
Modality
{modality}
Max Input Tokens
{maxInputTokens}
Price
{price}
Build end-to-end faster with models hosted by Pinecone.
Request a model
## Specify an API version
Traffic is also encrypted in transit between the Pinecone backend and cloud infrastructure services, such as S3 and GCS. For more information, see [Google Cloud Platform](https://cloud.google.com/docs/security/encryption-in-transit) and [AWS security documentation](https://docs.aws.amazon.com/AmazonS3/userguide/UsingEncryption.html).
## Network security
### Proxies
The following Pinecone SDKs support the use of proxies:
* [Python SDK](/reference/python-sdk#proxy-configuration)
* [Node.js SDK](/reference/node-sdk#proxy-configuration)
# Upgrade your plan
Source: https://docs.pinecone.io/guides/assistant/admin/upgrade-billing-plan
Upgrade to a paid plan to access advanced features and limits.
This page describes how to upgrade from the free Starter plan to the [Standard or Enterprise plan](https://www.pinecone.io/pricing/), paying either with a credit/debit card or through a supported cloud marketplace.
The `ConnectPopup` function can be called with either the JavaScript library or script. The JavaScript library is the most commonly used method, but the script can be used in instances where you cannot build and use a custom library, like within the constraints of a content management system (CMS).
The function includes the following **required** configuration option:
* `integrationId`: The slug assigned to the integration. If `integrationId` is not passed, the widget will not render.
[Once you have created your `integrationId`](#create-an-integration-id), you can embed the **Connect** widget multiple ways:
* [JavaScript](#javascript) library (`@pinecone-database/connect`) or script: Renders the widget in apps and websites.
* [Colab](#colab) (`pinecone-notebooks`): Renders the widget in Colab notebooks using Python.
Once you have created your integration, be sure to [attribute usage to your integration](/integrations/build-integration/attribute-usage-to-your-integration).
### JavaScript
To embed the **Connect to Pinecone** widget in your app or website using the [`@pinecone-database/connect` library](https://www.npmjs.com/package/@pinecone-database/connect), install the necessary dependencies:
```shell Shell theme={null}
# Install dependencies
npm i -S @pinecone-database/connect
```
You can use the JavaScript library to render the **Connect to Pinecone** widget and obtain the API key with the [`connectToPinecone` function](#connecttopinecone-function). It displays the widget and calls the provided callback function with the Pinecone API key, once the user completes the flow.
The function includes the following **required** configuration options:
* `integrationId`: The slug assigned to the integration. If `integrationId` is not passed, the widget will not render.
### Set up the environment
Start by installing the Cohere and Pinecone clients and HuggingFace *Datasets* for downloading the TREC dataset used in this guide:
```shell Shell theme={null}
pip install -U cohere pinecone datasets
```
### Create embeddings
Sign up for an API key at [Cohere](https://dashboard.cohere.com/api-keys) and then use it to initialize your connection.
```Python Python theme={null}
import cohere
co = cohere.Client("
We click on **Create new endpoint**, choose a model repository (eg name of the model), endpoint name (this can be anything), and select a cloud environment. Before moving on it is *very important* that we set the **Task** to **Sentence Embeddings** (found within the *Advanced configuration* settings).
Other important options include the *Instance Type*, by default this uses CPU which is cheaper but also slower. For faster processing we need a GPU instance. And finally, we set our privacy setting near the end of the page.
After setting our options we can click **Create Endpoint** at the bottom of the page. This action should take use to the next page where we will see the current status of our endpoint.
Once the status has moved from **Building** to **Running** (this can take some time), we're ready to begin creating embeddings with it.
## Create embeddings
Each endpoint is given an **Endpoint URL**, it can be found on the endpoint **Overview** page. We need to assign this endpoint URL to the `endpoint_url` variable.
```Python Python theme={null}
endpoint = "
```Python Python theme={null}
api_org = "
Let's get started...
### Environment Setup
We start by installing the OpenAI and Pinecone clients, we will also need HuggingFace *Datasets* for downloading the TREC dataset that we will use in this guide.
```Bash Bash theme={null}
!pip install -qU \
pinecone[grpc]==7.3.0 \
openai==1.93.0 \
datasets==3.6.0
```
#### Creating Embeddings
To create embeddings we must first initialize our connection to OpenAI Embeddings, we sign up for an API key at [OpenAI](https://beta.openai.com/signup).
```Python Python theme={null}
from openai import OpenAI
client = OpenAI(
api_key="OPENAI_API_KEY"
) # get API key from platform.openai.com
```
We can now create embeddings with the OpenAI v3 small embedding model like so:
```Python Python theme={null}
MODEL = "text-embedding-3-small"
res = client.embeddings.create(
input=[
"Sample document text goes here",
"there will be several phrases in each batch"
], model=MODEL
)
```
In `res` we should find a JSON-like object containing two 1536-dimensional embeddings, these are the vector representations of the two inputs provided above. To access the embeddings directly we can write:
```Python Python theme={null}
# we can extract embeddings to a list
embeds = [record.embedding for record in res.data]
len(embeds)
```
We will use this logic when creating our embeddings for the **T**ext **RE**trieval **C**onference (TREC) question classification dataset later.
#### Initializing a Pinecone Index
Next, we initialize an index to store the vector embeddings. For this we need a Pinecone API key, [sign up for one here](https://app.pinecone.io).
```Python Python theme={null}
import time
from pinecone.grpc import PineconeGRPC as Pinecone
from pinecone import ServerlessSpec
pc = Pinecone(api_key="...")
spec = ServerlessSpec(cloud="aws", region="us-east-1")
index_name = 'semantic-search-openai'
# check if index already exists (it shouldn't if this is your first run)
if index_name not in pc.list_indexes().names():
# if does not exist, create index
pc.create_index(
index_name,
dimension=len(embeds[0]), # dimensionality of text-embed-3-small
metric='dotproduct',
spec=spec
)
# connect to index
index = pc.Index(index_name)
time.sleep(1)
# view index stats
index.describe_index_stats()
```
#### Populating the Index
With both OpenAI and Pinecone connections initialized, we can move onto populating the index. For this, we need the TREC dataset.
```Python Python theme={null}
from datasets import load_dataset
# load the first 1K rows of the TREC dataset
trec = load_dataset('trec', split='train[:1000]')
```
Then we create a vector embedding for each question using OpenAI (as demonstrated earlier), and `upsert` the ID, vector embedding, and original text for each phrase to Pinecone.
In addition to the above, feedback functions also support the evaluation of ground truth agreement, sentiment, model agreement, language match, toxicity, and a full suite of moderation evaluations, including hate, violence and more. TruLens implements feedback functions as an extensible framework that can evaluate your custom needs as well.
During the development cycle, TruLens supports the iterative development of a wide range of LLM applications by wrapping your application to log cost, latency, key metadata and evaluations of each application run. This allows you to track and identify failure modes, pinpoint their root cause, and measure improvement across experiments.
### Why Pinecone?
Large language models alone have a hallucination problem. Several decades of machine learning research have optimized models, including modern LLMs, for generalization, while actively penalizing memorization. However, many of today's applications require factual, grounded answers. LLMs are also expensive to train, and provided by third party APIs. This means the knowledge of an LLM is fixed. Retrieval-augmented generation (RAG) is a way to reliably ensure models are grounded, with Pinecone as the curated source of real world information, long term memory, application domain knowledge, or whitelisted data.
In the RAG paradigm, rather than just passing a user question directly to a language model, the system retrieves any documents that could be relevant in answering the question from the knowledge base, and then passes those documents (along with the original question) to the language model to generate the final response. The most popular method for RAG involves chaining together LLMs with vector databases, such as the widely used Pinecone vector DB.
In this process, a numerical vector (an embedding) is calculated for all documents, and those vectors are then stored in a database optimized for storing and querying vectors. Incoming queries are vectorized as well, typically using an encoder LLM to convert the query into an embedding. The query embedding is then matched via embedding similarity against the document embeddings in the vector database to retrieve the documents that are relevant to the query.
Pinecone makes it easy to build high-performance vector search applications, including retrieval-augmented question answering. Pinecone can easily handle very large scales of hundreds of millions and even billions of vector embeddings. Pinecone's large scale allows it to handle long term memory or a large corpus of rich external and domain-appropriate data so that the LLM component of RAG application can focus on tasks like summarization, inference and planning. This setup is optimal for developing a non-hallucinatory application.\
In addition, Pinecone is fully managed, so it is easy to change configurations and components. Combined with the tracking and evaluation with TruLens, this is a powerful combination that enables fast iteration of your application.
### Using Pinecone and TruLens to improve LLM performance and reduce hallucination
To build an effective RAG-style LLM application, it is important to experiment with various configuration choices while setting up the vector database, and study their impact on performance metrics.
In this example, we explore the downstream impact of some of these configuration choices on response quality, cost and latency with a sample LLM application built with Pinecone as the vector DB. The evaluation and experiment tracking is done with the [TruLens](https://www.trulens.org/) open source library. TruLens offers an extensible set of [feedback functions](https://truera.com/ai-quality-education/generative-ai-and-llms/whats-missing-to-evaluate-foundation-models-at-scale/) to evaluate LLM apps and enables developers to easily track their LLM app experiments.
In each component of this application, different configuration choices can be made that can impact downstream performance. Some of these choices include the following:
**Constructing the Vector DB**
* Data preprocessing and selection
* Chunk Size and Chunk Overlap
* Index distance metric
* Selection of embeddings
**Retrieval**
* Amount of context retrieved (top k)
* Query planning
**LLM**
* Prompting
* Model choice
* Model parameters (size, temperature, frequency penalty, model retries, etc.)
These configuration choices are useful to keep in mind when constructing your app. In general, there is no optimal choice for all use cases. Rather, we recommend that you experiment with and evaluate a variety of configurations to find the optimal selection as you are building your application.
#### Creating the index in Pinecone
Here we'll download a pre-embedded dataset from the `pinecone-datasets` library allowing us to skip the embedding and preprocessing steps.
```Python Python theme={null}
import pinecone_datasets
dataset = pinecone_datasets.load_dataset('wikipedia-simple-text-embedding-ada-002-100K')
dataset.head()
```
After downloading the data, we can initialize our pinecone environment and create our first index. Here, we have our first potentially important choice, by selecting the **distance metric** used for our index.
```Python Python theme={null}
pinecone.create_index(
name=index_name_v1,
metric='cosine', # We'll try each distance metric here.
dimension=1536 # 1536 dim of text-embedding-ada-002.
)
```
Then, we can upsert our documents into the index in batches.
```Python Python theme={null}
for batch in dataset.iter_documents(batch_size=100):
index.upsert(batch)
```
#### Build the vector store
Now that we've built our index, we can start using LangChain to initialize our vector store.
```Python Python theme={null}
embed = OpenAIEmbeddings(
model='text-embedding-ada-002',
openai_api_key=OPENAI_API_KEY
)
from langchain.vectorstores import Pinecone
text_field = "text"
# Switch back to a normal index for LangChain.
index = pinecone.Index(index_name_v1)
vectorstore = Pinecone(
index, embed.embed_query, text_field
)
```
In RAG, we take the query as a question that is to be answered by an LLM, but the LLM must answer the question based on the information it receives from the `vectorstore`.
#### Initialize our RAG application
To do this, we initialize a `RetrievalQA` as our app:
```Python Python theme={null}
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA
# completion llm
llm = ChatOpenAI(
model_name='gpt-3.5-turbo',
temperature=0.0
)
qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=vectorstore.as_retriever()
)
```
#### TruLens for evaluation and tracking of LLM experiments
Once we've set up our app, we should put together our [feedback functions](https://truera.com/ai-quality-education/generative-ai-and-llms/whats-missing-to-evaluate-foundation-models-at-scale/). As a reminder, feedback functions are an extensible method for evaluating LLMs. Here we'll set up two feedback functions: `qs_relevance` and `qa_relevance`. They're defined as follows:
*QS Relevance: query-statement relevance is the average of relevance (0 to 1) for each context chunk returned by the semantic search.*
*QA Relevance: question-answer relevance is the relevance (again, 0 to 1) of the final answer to the original question.*
```Python Python theme={null}
# Imports main tools for eval
from trulens_eval import TruChain, Feedback, Tru, feedback, Select
import numpy as np
tru = Tru()
# OpenAI as feedback provider
openai = feedback.OpenAI()
# Question/answer relevance between overall question and answer.
qa_relevance = Feedback(openai.relevance).on_input_output()
# Question/statement relevance between question and each context chunk.
qs_relevance =
Feedback(openai.qs_relevance).
on_input()
# See explanation below
.on(Select.Record.app.combine_documents_chain._call.args.inputs.input_documents[:].page_content)
.aggregate(np.mean)
```
Our use of selectors here also requires an explanation.
QA Relevance is the simpler of the two. Here, we are using `.on_input_output()` to specify that the feedback function should be applied on both the input and output of the application.
For QS Relevance, we use TruLens selectors to locate the context chunks retrieved by our application. Let's break it down into simple parts:
1. Argument Specification – The `on_input` which appears first is a convenient shorthand and states that the first argument to `qs_relevance` (the question) is to be the main input of the app.
2. Argument Specification – The `on(Select...)` line specifies where the statement argument to the implementation comes from. We want to evaluate the context chunks, which are an intermediate step of the LLM app. This form references the langchain app object call chain, which can be viewed from `tru.run_dashboard()`. This flexibility allows you to apply a feedback function to any intermediate step of your LLM app. Below is an example where TruLens displays how to select each piece of the context.
3. Aggregation specification -- The last line aggregate (`np.mean`) specifies how feedback outputs are to be aggregated. This only applies to cases where the argument specification names more than one value for an input or output.
The result of these lines is that `f_qs_relevance` can be now be run on apps/records and will automatically select the specified components of those apps/records
To finish up, we just wrap our Retrieval QA app with TruLens along with a list of the feedback functions we will use for eval.
```Python Python theme={null}
# wrap with TruLens
truchain = TruChain(qa,
app_id='Chain1_WikipediaQA',
feedbacks=[qa_relevance, qs_relevance])
truchain(“Which state is Washington D.C. in?”)
```
After submitting a number of queries to our application, we can track our experiment and evaluations with the TruLens dashboard.
```Python Python theme={null}
tru.run_dashboard()
```
Here is a view of our first experiment:
#### Experiment with distance metrics
Now that we've walked through the process of building our tracked RAG application using cosine as the distance metric, all we have to do for the next two experiments is to rebuild the index with `euclidean` or `dotproduct` as the metric and follow the rest of the steps above as is.
Because we are using OpenAI embeddings, which are normalized to length 1, dot product and cosine distance are equivalent - and Euclidean will also yield the same ranking. See the OpenAI docs for more information. With the same document ranking, we should not expect a difference in response quality, but computation latency may vary across the metrics. Indeed, OpenAI advises that dot product computation may be a bit faster than cosine. We will be able to confirm this expected latency difference with TruLens.
```Python Python theme={null}
index_name_v2 = 'langchain-rag-euclidean'
pinecone.create_index(
name=index_name_v2,
metric='euclidean', # metric='dotproduct',
dimension=1536, # 1536 dim of text-embedding-ada-002
)
```
After doing so, we can view our evaluations for all three LLM apps sitting on top of the different indexes. All three apps are struggling with query-statement relevance. In other words, the context retrieved is only somewhat relevant to the original query.
**We can also see that both the Euclidean and dot-product metrics performed at a lower latency than cosine at roughly the same evaluation quality.**
### Problem: hallucination
Digging deeper into the Query Statement Relevance, we notice one problem in particular with a question about famous dental floss brands. The app responds correctly, but is not backed up by the context retrieved, which does not mention any specific brands.
#### Quickly evaluate app components with LangChain and TruLens
Using a less powerful model is a common way to reduce hallucination for some applications. We'll evaluate ada-001 in our next experiment for this purpose.
Changing different components of apps built with frameworks like LangChain is really easy. In this case we just need to call `text-ada-001` from the LangChain LLM store. Adding in easy evaluation with TruLens allows us to quickly iterate through different components to find our optimal app configuration.
```Python Python theme={null}
# completion llm
from langchain.llms import OpenAI
llm = OpenAI(
model_name='text-ada-001',
temperature=0
)
from langchain.chains import RetrievalQAWithSourcesChain
qa_with_sources = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=vectorstore.as_retriever()
)
# wrap with TruLens
truchain = TruChain(qa_with_sources,
app_id='Chain4_WikipediaQA',
feedbacks=[qa_relevance, qs_relevance])
```
**However, this configuration with a less powerful model struggles to return a relevant answer given the context provided.**
For example, when asked “Which year was Hawaii's state song written?”, the app retrieves context that contains the correct answer but fails to respond with that answer, instead simply responding with the name of the song.
While our relevance function is not doing a great job here in differentiating which context chunks are relevant, we can manually see that only the one (the 4th chunk) mentions the year the song was written. Narrowing our `top_k`, or the number of context chunks retrieved by the semantic search, may help.
We can do so as follows:
```Python Python theme={null}
qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=vectorstore.as_retriever(top_k = 1)
)
```
The way the `top_k` is implemented in LangChain's RetrievalQA is that the documents are still retrieved by semantic search and only the `top_k` are passed to the LLM. Therefore, TruLens also captures all of the context chunks that are being retrieved. In order to calculate an accurate QS Relevance metric that matches what's being passed to the LLM, we only calculate the relevance of the top context chunk retrieved by slicing the `input_documents` passed into the TruLens Select function:
```Python Python theme={null}
qs_relevance = Feedback(openai.qs_relevance).on_input().on(
Select.Record.app.combine_documents_chain._call.args.inputs.input_documents[:1].page_content
).aggregate(np.mean)
```
Once we've done so, our final application has much improved `qs_relevance`, `qa_relevance` and latency!
With that change, our application is successfully retrieving the one piece of context it needs, and successfully forming an answer from that context.
Even better, the application now knows what it doesn't know:
### Summary
In conclusion, we note that exploring the downstream impact of some Pinecone configuration choices on response quality, cost and latency is an important part of the LLM app development process, ensuring that we make the choices that lead to the app performing the best. Overall, TruLens and Pinecone are the perfect combination for building reliable RAG-style applications. Pinecone provides a way to efficiently store and retrieve context used by LLM apps, and TruLens provides a way to track and evaluate each iteration of your application.
# Twelve Labs
Source: https://docs.pinecone.io/integrations/twelve-labs
export const PrimarySecondaryCTA = ({primaryLabel, primaryHref, primaryTarget, secondaryLabel, secondaryHref, secondaryTarget}) =>
### Launch week: Pinecone Local
Pinecone now offers Pinecone Local, an in-memory database emulator available as a Docker image. You can use Pinecone Local to [develop your applications locally](/guides/operations/local-development), or to [test your applications in CI/CD](/guides/production/automated-testing), without connecting to your Pinecone account, affecting production data, or incurring any usage or storage fees. Pinecone Local is in [public preview](/release-notes/feature-availability).
### Launch week: Dark mode
Dark mode is now out for Pinecone's website, docs, and console. You can change your theme at the top right of each site.
Pinecone Assistant is generally available (GA) for all users.
[Read more](https://www.pinecone.io/blog/pinecone-assistant-generally-available) about the release on our blog.
# Error: Cannot import name 'Pinecone' from 'pinecone'
Source: https://docs.pinecone.io/troubleshooting/error-cannot-import-name-pinecone
## Problem
When using an older version of the [Python SDK](https://github.com/pinecone-io/pinecone-python-client/blob/main/README.md) (earlier than 3.0.0), trying to import the `Pinecone` class raises the following error:
```console console theme={null}
ImportError: cannot import name 'Pinecone' from 'pinecone'
```
## Solution
Upgrade the SDK version and try again:
```Shell Shell theme={null}
# If you're interacting with Pinecone via HTTP requests, use:
pip install pinecone --upgrade
```
```Shell Shell theme={null}
# If you're interacting with Pinecone via gRPC, use:
pip install "pinecone[grpc]" --upgrade
```
# Error: Handshake read failed when connecting
Source: https://docs.pinecone.io/troubleshooting/error-handshake-read-failed
## Problem
When trying to connect to Pinecone server, some users may receive an error message that says `Handshake read failed` and their connection attempt fails. This error can prevent them from running queries against their Pinecone indexes.
## Solution
If you encounter this error message, it means that your computer is not properly connecting with the Pinecone server. The error is often due to a misconfiguration of your Pinecone client or API key. Here is a recommended solution:
1. Make sure your firewall is not blocking any traffic and your internet connection is working fine. If you are unsure about how to do this, please consult your IT team.
2. Check that you have set up the Pinecone client and API key correctly. Double-check that you have followed the instructions in our [documentation](/guides/get-started/quickstart) correctly.
3. If you are still having issues, try creating a new index on Pinecone and populating it with data by running another script on your computer. This will verify that your computer can access the Pinecone servers for some tasks.
4. If the error persists, you may need to check your code for any misconfigurations. Make sure you are setting up your Pinecone client correctly and passing the right parameters when running queries against your indexes.
5. If you are still unable to resolve the issue, you can reach out to Pinecone support for assistance. They will be able to help you diagnose and resolve the issue.
## Conclusion
If you encounter the `Handshake read failed` error when trying to connect to Pinecone server, there are several steps you can take to resolve the issue. First, double-check that you have set up the Pinecone client and API key correctly. Then, check for any misconfigurations in your code. If the error persists, [contact Pinecone Support](/troubleshooting/contact-support) for assistance.
# Export indexes
Source: https://docs.pinecone.io/troubleshooting/export-indexes
Pinecone does not support an export function. It is on our roadmap for the future, however.
In the meantime, we recommend keeping a copy of your source data in case you need to move from one project to another, in which case you'll need to reindex the data.
For backup purposes, we recommend that you take periodic backups. Please see [Back up indexes](/guides/manage-data/back-up-an-index) in our documentation for more details on doing so.
# How to work with Support
Source: https://docs.pinecone.io/troubleshooting/how-to-work-with-support
There are several best practices for working with Pinecone Support that can lead to faster resolutions and more relevant recommendations. Please note that Pinecone Support is reserved for users in organizations on the Standard or Enterprise plan. First-response SLAs only apply to tickets created by users in an organization subscribed to a [support plan](https://www.pinecone.io/pricing/?plans=support). To upgrade your support plan, go to [Manage your support plan](https://app.pinecone.io/organizations/-/settings/support/plans) in the console and select your desired plan.
## Utilize Pinecone AI Support
Our [support chatbot](https://app.pinecone.io/organizations/-/settings/support) is knowledgeable of our documentation, troubleshooting articles, website and more. Many of your questions can be answered immediately using this resource. We also review all interactions with the support chatbot and constantly make improvements.
## Use the email associated with your Pinecone account
We map your account information to the tier of your organization to assign appropriate SLAs. If you open tickets using an email not associated with your Pinecone account, we will close your request and suggest alternative contact methods.
## Create tickets using the support portal
Instead of creating tickets via email, use the [Help center](https://app.pinecone.io/organizations/-/settings/support) in the Pinecone console to create tickets. The form allows you to provide helpful information such as severity and category. Furthermore, the conversation format will be much more digestible in the portal, especially when involving code snippets and other attachments.
## Select an appropriate severity
Pinecone Support reserves the right to change the ticket severity after our initial response and assessment of the case. Note that a Sev-1 ticket indicates that your production environment is completely unavailable, and a Sev-2 ticket indicates that your production environment has degraded performance. If your issue does not involve a production-level usage or application, please refrain from opening Sev-1 or Sev-2 tickets.
## Provide the exact names of impacted indexes and projects
When opening a ticket that involves specific resources in your organization, please specify the name of the impacted index(es) and project(s).
## Provide as detailed a description as possible
Please include code snippets, version specifications, and the full stack trace of error messages you encounter. Whenever possible, please include screenshots or screen recordings. The more information you provide, the more likely we can effectively assist you in our first response, and you can return to building with Pinecone.
# Serverless index creation error - max serverless indexes
Source: https://docs.pinecone.io/troubleshooting/index-creation-error-max-serverless
## Problem
Each project is limited to 20 serverless indexes. Trying to create more than 20 serverless indexes in a project raises the following `403 (FORBIDDEN)` error:
```console console theme={null}
This project already contains 20 serverless indexes, the maximum per project.
Delete any unused indexes and try again, or create a new project for more serverless indexes.
For additional help, please contact support@pinecone.io.
```
## Solution
[Delete any unused serverless indexes](/guides/manage-data/manage-indexes#delete-an-index) in the project and try again, or create a new project to hold additional serverless indexes.
Also consider using [namespaces](/guides/index-data/indexing-overview#namespaces) to partition vectors of the same dimensionality within a single index. Namespaces can help speed up queries as well as comply with [multitenancy](/guides/index-data/implement-multitenancy) requirements.
# Index creation error - missing spec parameter
Source: https://docs.pinecone.io/troubleshooting/index-creation-error-missing-spec
## Problem
Using the [new API](/reference/api), creating an index requires passing appropriate values into the `spec` parameter. Without this `spec` parameter, the `create_index` method raises the following error:
```console console theme={null}
TypeError: Pinecone.create_index() missing 1 required positional argument: 'spec'
```
## Solution
Set the `spec` parameter. For guidance on how to set this parameter, see [Create an index](/guides/index-data/create-an-index#create-a-serverless-index).
# Keep customer data separate in Pinecone
Source: https://docs.pinecone.io/troubleshooting/keep-customer-data-separate
Some use cases require vectors to be segmented by their customers, either physically or logically. The table below describes three techniques to accomplish this and the pros and cons of considering each:
| **Techniques** | **Pros** | **Cons** |
| ----------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------- |
| **Separate Indexes**
### Launch week: Pinecone Local
Pinecone now offers Pinecone Local, an in-memory database emulator available as a Docker image. You can use Pinecone Local to [develop your applications locally](/guides/operations/local-development), or to [test your applications in CI/CD](/guides/production/automated-testing), without connecting to your Pinecone account, affecting production data, or incurring any usage or storage fees. Pinecone Local is in [public preview](/release-notes/feature-availability).
### Launch week: Dark mode
Dark mode is now out for Pinecone's website, docs, and console. You can change your theme at the top right of each site.
Pinecone Assistant is generally available (GA) for all users.
[Read more](https://www.pinecone.io/blog/pinecone-assistant-generally-available) about the release on our blog.
{text}
; }; export const ExampleCard = ({title, text, link, children, arrow, vectors, namespaces}) => { return{text}
{children &&
## Data ingestion
When a [document is uploaded](/guides/assistant/manage-files), the assistant processes the content by chunking it into smaller parts and generating [vector embeddings](https://www.pinecone.io/learn/vector-embeddings-for-developers/) for each chunk. These embeddings are stored in an [index](/guides/index-data/indexing-overview), making them ready for retrieval.
## Data retrieval
During a [chat](/guides/assistant/chat-with-assistant), the assistant processes the message to formulate relevant search queries, which are used to query the index and identify the most relevant chunks from the uploaded content.
## Response generation
After retrieving these chunks, the assistant performs a ranking step to determine which information is most relevant. This [context](/guides/assistant/context-snippets-overview), along with the chat history and [assistant instructions](/guides/assistant/manage-assistants#add-instructions-to-an-assistant), is then used by a large language model (LLM) to generate responses that are informed by your documents.