embed-multilingual-v3.0 | Cohere

METRIC

cosine, dot product

DIMENSION

1024

MAX INPUT TOKENS

512

TASK

embedding

Overview

Multilingual embedding model ideal for easy to use text embeddings where short queries are expected to return medium-length passages of text (1-2 paragraphs)..

Must add input_type="search_document" to requests when embedding passages/documents, and input_type="search_query" when embedding queries. See here for an example.

Using the model

Installation:

!pip install -qU cohere==4.34 pinecone

Create Index

from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key="API_KEY")

# Create Index
index_name = "cohere-embed-multilingual-v3"

if not pc.has_index(index_name):
    pc.create_index(
        name=index_name,
        dimension=1024,
        metric="cosine",
        spec=ServerlessSpec(
            cloud='aws',
            region='us-east-1'
        )
    )

index = pc.Index(index_name)

Embed & Upsert

# Embed data
data = [
    {"id": "vec1", "text": "Apple is a popular fruit known for its sweetness and crisp texture."},
    {"id": "vec2", "text": "The tech company Apple is known for its innovative products like the iPhone."},
    {"id": "vec3", "text": "Many people enjoy eating apples as a healthy snack."},
    {"id": "vec4", "text": "Apple Inc. has revolutionized the tech industry with its sleek designs and user-friendly interfaces."},
    {"id": "vec5", "text": "An apple a day keeps the doctor away, as the saying goes."},
]

import cohere

co = cohere.Client("COHERE_API_KEY")


def embed(docs: list[str], input_type: str) -> list[list[float]]:
    doc_embeds = co.embed(
            docs,
        input_type=input_type,
        model="embed-multilingual-v3.0"
    )
    return doc_embeds.embeddings

# when encoding documents / passages
embeddings = embed([d["text"] for d in data] , input_type="search_document")



vectors = []
for d, e in zip(data, embeddings):
    vectors.append({
        "id": d['id'],
        "values": e,
        "metadata": {'text': d['text']}
    })

index.upsert(
    vectors=vectors,
    namespace="ns1"
)

Query

query = "Tell me about the tech company known as Apple"

# when encoding a query
x = embed([query], input_type="search_query")

results = index.query(
    namespace="ns1",
    vector=x[0],
    top_k=3,
    include_values=False,
    include_metadata=True
)

print(results)

Lorem Ipsum

Was this page helpful?