nvidia/llama-text-embed-v2 is a state-of-the-art embedding model available natively in Pinecone Inference. Developed by NVIDIA Research, it is built on the Llama 3.2 1B architecture and optimized for high retrieval quality with low-latency inference. Also known as llama-3_2-nv-embedqa-1b-v2, the model distills techniques from NVIDIA’s industry-leading NV-2 (7B parameters) into an efficient, production-ready solution.
Retrieval quality: The model surpasses OpenAI’s text-embedding-3-large across multiple benchmarks, in some cases improving accuracy by more than 20%
Real-time queries: Predictable and consistent query speeds for responsive search with p99 latencies 12x faster than OpenAI Large
Multilingual: Supports 26 languages, including English, Spanish, Chinese, Hindi, Japanese, Korean, French, and German
Installation
pip install--upgrade pinecone
Create index
from pinecone import Pineconepc = Pinecone(api_key="YOUR_API_KEY")# Create a dense index with integrated inferenceindex_name ="llama-text-index"pc.create_index_from_model( name=index_name, cloud="aws", region="us-east-1", embed={"model":"llama-text-index","field_map":{"text":"text"# Map the record field to be embedded}})index = pc.Index(index_name)
Embed & upsert
data =[{"id":"vec1","text":"Apple is a popular fruit known for its sweetness and crisp texture."},{"id":"vec2","text":"The tech company Apple is known for its innovative products like the iPhone."},{"id":"vec3","text":"Many people enjoy eating apples as a healthy snack."},{"id":"vec4","text":"Apple Inc. has revolutionized the tech industry with its sleek designs and user-friendly interfaces."},{"id":"vec5","text":"An apple a day keeps the doctor away, as the saying goes."},{"id":"vec6","text":"Apple Computer Company was founded on April 1, 1976, by Steve Jobs, Steve Wozniak, and Ronald Wayne as a partnership."}]index.upsert_records( namespace="example-namespace", records=data)
Query
query_payload ={"inputs":{"text":"Tell me about the tech company known as Apple."},"top_k":3}results = index.search( namespace="example-namespace", query=query_payload)print(results)