voyage-3-large | Voyage AI

METRIC

cosine, dot product

DIMENSION

1024, 256, 512, 2048

MAX INPUT TOKENS

32000

TASK

embedding

Overview

The best general-purpose and multilingual retrieval quality. Visit the Voyage documentation for an overview of all Voyage embedding models and rerankers.

Access to models is through the Voyage Python client. You must register for Voyage API keys to access.

Using the model

Installation

!pip install -qU voyageai pinecone

Define Embedding Parameters

EMBEDDING_DIMENSION = 1024  # can choose between 1024 (default), 256, 512, and 2048
EMBEDDING_DTYPE = "float"   # can choose between "float" (default), "int8", "uint8", "binary", "ubinary"

Create Index

from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key="API_KEY")

# Create Index
index_name = "voyage-3-large"

if not pc.has_index(index_name):
    pc.create_index(
        name=index_name,
        dimension=EMBEDDING_DIMENSION,
        metric="cosine",
        spec=ServerlessSpec(
            cloud="aws",
            region="us-east-1"
        )
    )

index = pc.Index(index_name)

Embed & Upsert

# Embed data
data = [
    {"id": "vec1", "text": "Apple is a popular fruit known for its sweetness and crisp texture."},
    {"id": "vec2", "text": "The tech company Apple is known for its innovative products like the iPhone."},
    {"id": "vec3", "text": "Many people enjoy eating apples as a healthy snack."},
    {"id": "vec4", "text": "Apple Inc. has revolutionized the tech industry with its sleek designs and user-friendly interfaces."},
    {"id": "vec5", "text": "An apple a day keeps the doctor away, as the saying goes."},
]

import voyageai

vo = voyageai.Client(api_key=VOYAGE_API_KEY)

model_id = "voyage-3-large"

def embed(docs: list[str], input_type: str) -> list[list[float]]:
    embeddings = vo.embed(
		    docs,
		    model=model_id,
		    input_type=input_type,
		    output_dimension=EMBEDDING_DIMENSION,
		    output_dtype=EMBEDDING_DTYPE
		).embeddings
    return embeddings

# Use "document" input type for documents
embeddings = embed([d["text"] for d in data], input_type="document")

vectors = []
for d, e in zip(data, embeddings):
    vectors.append({
        "id": d['id'],
        "values": e,
        "metadata": {'text': d['text']}
    })

index.upsert(
    vectors=vectors,
    namespace="ns1"
)

Query

query = "Tell me about the tech company known as Apple"

# Use "query" input type for queries
x = embed([query], input_type="query")

results = index.query(
    namespace="ns1",
    vector=x[0],
    top_k=3,
    include_values=False,
    include_metadata=True
)

print(results)

Lorem Ipsum

Was this page helpful?