Lorem Ipsum
instructor-xl
Use the instructor-xl embedding or reranking model with Pinecone: specs and index setup. An instruction-finetuned text embedding model that can generate text.
Was this page helpful?
⌘I
Documentation Index
Fetch the complete documentation index at: /llms.txt
Use this file to discover all available pages before exploring further.
🎉 New: Standard and Enterprise orgs get a one-time $250 bulk import credit (1 TB), through July 31, 2026. See details
Use the instructor-xl embedding or reranking model with Pinecone: specs and index setup. An instruction-finetuned text embedding model that can generate text.
!pip install transformers==4.20.0 InstructorEmbedding pinecone sentence-transformers
from pinecone import Pinecone, ServerlessSpec
pc = Pinecone(api_key="API_KEY")
# Create Index
index_name = "instructor-xl"
if not pc.has_index(index_name):
pc.create_index(
name=index_name,
dimension=1024,
metric="cosine",
spec=ServerlessSpec(
cloud='aws',
region='us-east-1'
)
)
index = pc.Index(index_name)
# Embed data
data = [
{"id": "vec1", "text": "Apple is a popular fruit known for its sweetness and crisp texture."},
{"id": "vec2", "text": "The tech company Apple is known for its innovative products like the iPhone."},
{"id": "vec3", "text": "Many people enjoy eating apples as a healthy snack."},
{"id": "vec4", "text": "Apple Inc. has revolutionized the tech industry with its sleek designs and user-friendly interfaces."},
{"id": "vec5", "text": "An apple a day keeps the doctor away, as the saying goes."},
]
# using Instructor, we need an instruction to append to passages
instruction = "Represent the following document for retrieval: "
from InstructorEmbedding import INSTRUCTOR
model = INSTRUCTOR('hkunlp/instructor-xl')
# align instructions with text data
# you can vary the instructions by data as well
instruction_embedding_pairs = [[instruction, d["text"]] for d in data]
embeddings = model.encode(instruction_embedding_pairs)
vectors = []
for d, e in zip(data, embeddings):
vectors.append({
"id": d['id'],
"values": e,
"metadata": {'text': d['text']}
})
index.upsert(
vectors=vectors,
namespace="ns1"
)
query_instruction = "Represent this query for retrieving supporting documents: "
query = "Tell me about the tech company known as Apple"
x = model.encode([[query_instruction, query]])
results = index.query(
namespace="ns1",
vector=x[0].tolist(),
top_k=3,
include_values=False,
include_metadata=True
)
print(results)
Was this page helpful?