embed-english-light-v3.0 | Cohere

METRIC

cosine, dot product

DIMENSION

384

MAX INPUT TOKENS

512

TASK

embedding

Overview

Ideal for easy to use text embeddings where short queries are expected to return medium-length passages of text (1-2 paragraphs). Performance is close to full size Cohere-embed-english-v3.0 model while outputting lower-dimensional embeddings for vector storage savings.

Must add input_type=\"search_document\" to requests when embedding passages/documents, and input_type=\"search_query\" when embedding queries. See here for an example.

Using the model

Installation:

!pip install -qU cohere==4.34 pinecone

Create Index

from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key="API_KEY")

# Create Index
index_name = "cohere-embed-english-light-v3"

if not pc.has_index(index_name):
    pc.create_index(
        name=index_name,
        dimension=384,
        metric="cosine",
        spec=ServerlessSpec(
            cloud='aws',
            region='us-east-1'
        )
    )

index = pc.Index(index_name)

Embed & Upsert

# Embed data
data = [
    {"id": "vec1", "text": "Apple is a popular fruit known for its sweetness and crisp texture."},
    {"id": "vec2", "text": "The tech company Apple is known for its innovative products like the iPhone."},
    {"id": "vec3", "text": "Many people enjoy eating apples as a healthy snack."},
    {"id": "vec4", "text": "Apple Inc. has revolutionized the tech industry with its sleek designs and user-friendly interfaces."},
    {"id": "vec5", "text": "An apple a day keeps the doctor away, as the saying goes."},
]

import cohere

co = cohere.Client("COHERE_API_KEY")


def embed(docs: list[str], input_type: str) -> list[list[float]]:
    doc_embeds = co.embed(
            docs,
        input_type=input_type,
        model="embed-english-light-v3.0"
    )
    return doc_embeds.embeddings

# when encoding documents / passages
embeddings = embed([d["text"] for d in data] , input_type="search_document")


vectors = []
for d, e in zip(data, embeddings):
    vectors.append({
        "id": d['id'],
        "values": e,
        "metadata": {'text': d['text']}
    })

index.upsert(
    vectors=vectors,
    namespace="ns1"
)

Query

query = "Tell me about the tech company known as Apple"

# when encoding a query
x = embed([query], input_type="search_query")

results = index.query(
    namespace="ns1",
    vector=x[0],
    top_k=3,
    include_values=False,
    include_metadata=True
)

print(results)

Lorem Ipsum

Was this page helpful?