llama-text-embed-v2 | NVIDIA

Hosted
METRIC

cosine

DIMENSION

1024, 2048, 768, 512, 384

MAX INPUT TOKENS

2048

TASK

embedding

PRICE

$0.16

Overview

nvidia/llama-text-embed-v2 is a state-of-the-art embedding model available natively in Pinecone Inference. Developed by NVIDIA Research, it is built on the Llama 3.2 1B architecture and optimized for high retrieval quality with low-latency inference. Also known as llama-3_2-nv-embedqa-1b-v2, the model distills techniques from NVIDIA’s industry-leading NV-2 (7B parameters) into an efficient, production-ready solution.

Installation

Create Index

Embed & Upsert

Query

Lorem Ipsum

Was this page helpful?