Quickstart

This guide shows you how to set up and use Pinecone Database for high-performance semantic search.

To get started in your browser, use the Quickstart colab notebook.

If you’re new to Pinecone, sign up at app.pinecone.io and choose a free plan:

Starter plan: If you’re building a small or personal project, this is likely the right choice. You get free access to most features, but you’re limited to one cloud region and need to stay under Starter plan limits.
Standard plan trial: If you’re building for scale or commercial use, choose this option. You get 21 days and $300 in credits with access to Standard plan features and higher limits that let you test Pinecone at scale.

After signing up, you’ll receive an API key in the console. Save this key. You’ll need it to authenticate your requests to Pinecone.

2. Install an SDK

Pinecone SDKs provide convenient programmatic access to the Pinecone APIs. Install the SDK for your preferred language:

pip install pinecone

3. Create an index

In Pinecone, there are two types of indexes for storing vector data: Dense indexes store for semantic search, and sparse indexes store for lexical/keyword search. For this quickstart, create a dense index that is integrated with an embedding model hosted by Pinecone. With integrated models, you upsert and search with text and have Pinecone generate vectors automatically.

If you prefer to use external embedding models, see Bring your own vectors.

# Import the Pinecone library
from pinecone import Pinecone

# Initialize a Pinecone client with your API key
pc = Pinecone(api_key="YOUR_API_KEY")

# Create a dense index with integrated embedding
index_name = "quickstart-py"
if not pc.has_index(index_name):
    pc.create_index_for_model(
        name=index_name,
        cloud="aws",
        region="us-east-1",
        embed={
            "model":"llama-text-embed-v2",
            "field_map":{"text": "chunk_text"}
        }
    )

4. Upsert text

Prepare a sample dataset of factual statements from different domains like history, physics, technology, and music. Model the data as as records with an ID, text, and category.

records = [
    { "_id": "rec1", "chunk_text": "The Eiffel Tower was completed in 1889 and stands in Paris, France.", "category": "history" },
    { "_id": "rec2", "chunk_text": "Photosynthesis allows plants to convert sunlight into energy.", "category": "science" },
    { "_id": "rec3", "chunk_text": "Albert Einstein developed the theory of relativity.", "category": "science" },
    { "_id": "rec4", "chunk_text": "The mitochondrion is often called the powerhouse of the cell.", "category": "biology" },
    { "_id": "rec5", "chunk_text": "Shakespeare wrote many famous plays, including Hamlet and Macbeth.", "category": "literature" },
    { "_id": "rec6", "chunk_text": "Water boils at 100°C under standard atmospheric pressure.", "category": "physics" },
    { "_id": "rec7", "chunk_text": "The Great Wall of China was built to protect against invasions.", "category": "history" },
    { "_id": "rec8", "chunk_text": "Honey never spoils due to its low moisture content and acidity.", "category": "food science" },
    { "_id": "rec9", "chunk_text": "The speed of light in a vacuum is approximately 299,792 km/s.", "category": "physics" },
    { "_id": "rec10", "chunk_text": "Newton's laws describe the motion of objects.", "category": "physics" },
    { "_id": "rec11", "chunk_text": "The human brain has approximately 86 billion neurons.", "category": "biology" },
    { "_id": "rec12", "chunk_text": "The Amazon Rainforest is one of the most biodiverse places on Earth.", "category": "geography" },
    { "_id": "rec13", "chunk_text": "Black holes have gravitational fields so strong that not even light can escape.", "category": "astronomy" },
    { "_id": "rec14", "chunk_text": "The periodic table organizes elements based on their atomic number.", "category": "chemistry" },
    { "_id": "rec15", "chunk_text": "Leonardo da Vinci painted the Mona Lisa.", "category": "art" },
    { "_id": "rec16", "chunk_text": "The internet revolutionized communication and information sharing.", "category": "technology" },
    { "_id": "rec17", "chunk_text": "The Pyramids of Giza are among the Seven Wonders of the Ancient World.", "category": "history" },
    { "_id": "rec18", "chunk_text": "Dogs have an incredible sense of smell, much stronger than humans.", "category": "biology" },
    { "_id": "rec19", "chunk_text": "The Pacific Ocean is the largest and deepest ocean on Earth.", "category": "geography" },
    { "_id": "rec20", "chunk_text": "Chess is a strategic game that originated in India.", "category": "games" },
    { "_id": "rec21", "chunk_text": "The Statue of Liberty was a gift from France to the United States.", "category": "history" },
    { "_id": "rec22", "chunk_text": "Coffee contains caffeine, a natural stimulant.", "category": "food science" },
    { "_id": "rec23", "chunk_text": "Thomas Edison invented the practical electric light bulb.", "category": "inventions" },
    { "_id": "rec24", "chunk_text": "The moon influences ocean tides due to gravitational pull.", "category": "astronomy" },
    { "_id": "rec25", "chunk_text": "DNA carries genetic information for all living organisms.", "category": "biology" },
    { "_id": "rec26", "chunk_text": "Rome was once the center of a vast empire.", "category": "history" },
    { "_id": "rec27", "chunk_text": "The Wright brothers pioneered human flight in 1903.", "category": "inventions" },
    { "_id": "rec28", "chunk_text": "Bananas are a good source of potassium.", "category": "nutrition" },
    { "_id": "rec29", "chunk_text": "The stock market fluctuates based on supply and demand.", "category": "economics" },
    { "_id": "rec30", "chunk_text": "A compass needle points toward the magnetic north pole.", "category": "navigation" },
    { "_id": "rec31", "chunk_text": "The universe is expanding, according to the Big Bang theory.", "category": "astronomy" },
    { "_id": "rec32", "chunk_text": "Elephants have excellent memory and strong social bonds.", "category": "biology" },
    { "_id": "rec33", "chunk_text": "The violin is a string instrument commonly used in orchestras.", "category": "music" },
    { "_id": "rec34", "chunk_text": "The heart pumps blood throughout the human body.", "category": "biology" },
    { "_id": "rec35", "chunk_text": "Ice cream melts when exposed to heat.", "category": "food science" },
    { "_id": "rec36", "chunk_text": "Solar panels convert sunlight into electricity.", "category": "technology" },
    { "_id": "rec37", "chunk_text": "The French Revolution began in 1789.", "category": "history" },
    { "_id": "rec38", "chunk_text": "The Taj Mahal is a mausoleum built by Emperor Shah Jahan.", "category": "history" },
    { "_id": "rec39", "chunk_text": "Rainbows are caused by light refracting through water droplets.", "category": "physics" },
    { "_id": "rec40", "chunk_text": "Mount Everest is the tallest mountain in the world.", "category": "geography" },
    { "_id": "rec41", "chunk_text": "Octopuses are highly intelligent marine creatures.", "category": "biology" },
    { "_id": "rec42", "chunk_text": "The speed of sound is around 343 meters per second in air.", "category": "physics" },
    { "_id": "rec43", "chunk_text": "Gravity keeps planets in orbit around the sun.", "category": "astronomy" },
    { "_id": "rec44", "chunk_text": "The Mediterranean diet is considered one of the healthiest in the world.", "category": "nutrition" },
    { "_id": "rec45", "chunk_text": "A haiku is a traditional Japanese poem with a 5-7-5 syllable structure.", "category": "literature" },
    { "_id": "rec46", "chunk_text": "The human body is made up of about 60% water.", "category": "biology" },
    { "_id": "rec47", "chunk_text": "The Industrial Revolution transformed manufacturing and transportation.", "category": "history" },
    { "_id": "rec48", "chunk_text": "Vincent van Gogh painted Starry Night.", "category": "art" },
    { "_id": "rec49", "chunk_text": "Airplanes fly due to the principles of lift and aerodynamics.", "category": "physics" },
    { "_id": "rec50", "chunk_text": "Renewable energy sources include wind, solar, and hydroelectric power.", "category": "energy" }
]

Upsert the sample dataset into a new namespace in your index. Because your index is integrated with an embedding model, you provide the textual statements and Pinecone converts them to dense vectors automatically.

# Target the index
dense_index = pc.Index(index_name)

# Upsert the records into a namespace
dense_index.upsert_records("example-namespace", records)

To control costs when ingesting large datasets (10,000,000+ records), use import instead of upsert.

Pinecone is eventually consistent, so there can be a slight delay before new or changed records are visible to queries. You can view index stats to check if the current vector count matches the number of vectors you upserted (50):

# Wait for the upserted vectors to be indexed
import time
time.sleep(10)

# View stats for the index
stats = dense_index.describe_index_stats()
print(stats)

The response looks like this:

{'dimension': 1024,
 'index_fullness': 0.0,
 'metric': 'cosine',
 'namespaces': {'example-namespace': {'vector_count': 50}},
 'total_vector_count': 50,
 'vector_type': 'dense'}

5. Semantic search

Search the dense index for ten records that are most semantically similar to the query, “Famous historical structures and monuments”. Again, because your index is integrated with an embedding model, you provide the query as text and Pinecone converts the text to a dense vector automatically.

# Define the query
query = "Famous historical structures and monuments"

# Search the dense index
results = dense_index.search(
    namespace="example-namespace",
    query={
        "top_k": 10,
        "inputs": {
            'text': query
        }
    }
)

# Print the results
for hit in results['result']['hits']:
        print(f"id: {hit['_id']:<5} | score: {round(hit['_score'], 2):<5} | category: {hit['fields']['category']:<10} | text: {hit['fields']['chunk_text']:<50}")

Notice that most of the results are about historical structures and monuments. However, a few unrelated statements are included as well and are ranked high in the list, for example, a statement about Shakespeare.

id: rec17 | score: 0.24  | category: history    | text: The Pyramids of Giza are among the Seven Wonders of the Ancient World.
id: rec38 | score: 0.19  | category: history    | text: The Taj Mahal is a mausoleum built by Emperor Shah Jahan.
id: rec5  | score: 0.19  | category: literature | text: Shakespeare wrote many famous plays, including Hamlet and Macbeth.
id: rec15 | score: 0.11  | category: art        | text: Leonardo da Vinci painted the Mona Lisa.          
id: rec50 | score: 0.1   | category: energy     | text: Renewable energy sources include wind, solar, and hydroelectric power.
id: rec26 | score: 0.09  | category: history    | text: Rome was once the center of a vast empire.        
id: rec47 | score: 0.08  | category: history    | text: The Industrial Revolution transformed manufacturing and transportation.
id: rec7  | score: 0.07  | category: history    | text: The Great Wall of China was built to protect against invasions.
id: rec1  | score: 0.07  | category: history    | text: The Eiffel Tower was completed in 1889 and stands in Paris, France.
id: rec3  | score: 0.07  | category: science    | text: Albert Einstein developed the theory of relativity.

6. Rerank results

To get a more accurate ranking, search again but this time rerank the initial results based on their relevance to the query.

# Search the dense index and rerank results
reranked_results = dense_index.search(
    namespace="example-namespace",
    query={
        "top_k": 10,
        "inputs": {
            'text': query
        }
    },
    rerank={
        "model": "bge-reranker-v2-m3",
        "top_n": 10,
        "rank_fields": ["chunk_text"]
    }   
)

# Print the reranked results
for hit in reranked_results['result']['hits']:
    print(f"id: {hit['_id']}, score: {round(hit['_score'], 2)}, text: {hit['fields']['chunk_text']}, category: {hit['fields']['category']}")

Notice that all of the most relevant results about historical structures and monuments are now ranked highest.

id: rec1  | score: 0.11  | category: history    | text: The Eiffel Tower was completed in 1889 and stands in Paris, France.
id: rec38 | score: 0.06  | category: history    | text: The Taj Mahal is a mausoleum built by Emperor Shah Jahan.
id: rec7  | score: 0.06  | category: history    | text: The Great Wall of China was built to protect against invasions.
id: rec17 | score: 0.02  | category: history    | text: The Pyramids of Giza are among the Seven Wonders of the Ancient World.
id: rec26 | score: 0.01  | category: history    | text: Rome was once the center of a vast empire.        
id: rec15 | score: 0.01  | category: art        | text: Leonardo da Vinci painted the Mona Lisa.          
id: rec5  | score: 0.0   | category: literature | text: Shakespeare wrote many famous plays, including Hamlet and Macbeth.
id: rec47 | score: 0.0   | category: history    | text: The Industrial Revolution transformed manufacturing and transportation.
id: rec50 | score: 0.0   | category: energy     | text: Renewable energy sources include wind, solar, and hydroelectric power.
id: rec3  | score: 0.0   | category: science    | text: Albert Einstein developed the theory of relativity.

7. Improve results

Reranking results is one of the most effective ways to improve search accuracy and relevance, but there are many other techniques to consider. For example:

Filtering by metadata: When records contain additional metadata, you can limit the search to records matching a filter expression.
Hybrid search: You can add lexical search to capture precise keyword matches (e.g., product SKUs, email addresses, domain-specific terms) in addition to semantic matches.
Chunking strategies: You can chunk your content in different ways to get better results. Consider factors like the length of the content, the complexity of queries, and how results will be used in your application.

8. Clean up

When you no longer need your example index, delete it as follows:

# Delete the index
pc.delete_index(index_name)

For production indexes, consider enabling deletion protection.

Next steps

Index data

Learn more about storing data in Pinecone

Search

Explore different forms of vector search.

Optimize

Find out how to improve performance

Get started

Index data

Search

Optimize

Manage data

Manage cost

Move to production

Admin

Operations

Using pods

2. Install an SDK

3. Create an index

4. Upsert text

5. Semantic search

6. Rerank results

7. Improve results

8. Clean up

Next steps

Index data

Search

Optimize

Get started

Index data

Search

Optimize

Manage data

Manage cost

Move to production

Admin

Operations

Using pods

​1. Sign up

​2. Install an SDK

​3. Create an index

​4. Upsert text

​5. Semantic search

​6. Rerank results

​7. Improve results

​8. Clean up

​Next steps

Index data

Search

Optimize

1. Sign up

2. Install an SDK

3. Create an index

4. Upsert text

5. Semantic search

6. Rerank results

7. Improve results

8. Clean up

Next steps