ANN_DEEP1B_d96_angular | 9,990,000 | https://github.com/erikbern/ann-benchmarks | gs://pinecone-datasets-dev/ANN_DEEP1B_d96_angular | ANN | ANN benchmark (96) | None |
ANN_Fashion-MNIST_d784_euclidean | 60,000 | https://github.com/erikbern/ann-benchmarks | gs://pinecone-datasets-dev/ANN_Fashion-MNIST_d784_euclidean | ANN | ANN benchmark (784) | None |
ANN_GIST_d960_euclidean | 1,000,000 | https://github.com/erikbern/ann-benchmarks | gs://pinecone-datasets-dev/ANN_GIST_d960_euclidean | ANN | ANN benchmark (960) | None |
ANN_GloVe_d100_angular | 1,183,514 | https://github.com/erikbern/ann-benchmarks | gs://pinecone-datasets-dev/ANN_GloVe_d100_angular | ANN | ANN benchmark (100) | None |
ANN_GloVe_d200_angular | 1,183,514 | https://github.com/erikbern/ann-benchmarks | gs://pinecone-datasets-dev/ANN_GloVe_d200_angular | ANN | ANN benchmark (200) | None |
ANN_GloVe_d25_angular | 1,183,514 | https://github.com/erikbern/ann-benchmarks | gs://pinecone-datasets-dev/ANN_GloVe_d25_angular | ANN | ANN benchmark (25) | None |
ANN_GloVe_d50_angular | 1,183,514 | https://github.com/erikbern/ann-benchmarks | gs://pinecone-datasets-dev/ANN_GloVe_d50_angular | ANN | ANN benchmark (50) | None |
ANN_GloVe_d64_angular | 292,385 | https://github.com/erikbern/ann-benchmarks | gs://pinecone-datasets-dev/ANN_GloVe_d64_angular | ANN | ANN benchmark (65) | None |
ANN_MNIST_d784_euclidean | 60,000 | https://github.com/erikbern/ann-benchmarks | gs://pinecone-datasets-dev/ANN_MNIST_d784_euclidean | ANN | ANN benchmark (784) | None |
ANN_NYTimes_d256_angular | 290,000 | https://github.com/erikbern/ann-benchmarks | gs://pinecone-datasets-dev/ANN_NYTimes_d256_angular | ANN | ANN benchmark (256) | None |
ANN_SIFT1M_d128_euclidean | 1,000,000 | https://github.com/erikbern/ann-benchmarks | gs://pinecone-datasets-dev/ANN_SIFT1M_d128_euclidean | ANN | ANN benchmark (128) | None |
amazon_toys_quora_all-MiniLM-L6-bm25 | 10,000 | https://www.kaggle.com/datasets/PromptCloudHQ/toy-products-on-amazon | gs://pinecone-datasets-dev/amazon_toys_quora_all-MiniLM-L6-bm25 | QA | sentence-transformers/all-MiniLM-L6-v2 (384) | bm25 |
it-threat-data-test | 1,042,965 | https://cse-cic-ids2018.s3.ca-central-1.amazonaws.com/Processed%20Traffic%20Data%20for%20ML%20Algorithms/Thursday-22-02-2018_TrafficForML_CICFlowMeter.csv | it_threat_model.model (128) | None | | |
it-threat-data-train | 1,042,867 | https://cse-cic-ids2018.s3.ca-central-1.amazonaws.com/Processed%20Traffic%20Data%20for%20ML%20Algorithms/Thursday-22-02-2018_TrafficForML_CICFlowMeter.csv | it_threat_model.model (128) | None | | |
langchain-python-docs-text-embedding-ada-002 | 3476 | https://huggingface.co/datasets/jamescalam/langchain-docs-23-06-27 | text-embedding-ada-002 (1536) | None | | |
movielens-user-ratings | 970,582 | https://huggingface.co/datasets/pinecone/movielens-recent-ratings | gs://pinecone-datasets-dev/movielens-user-ratings | classification | pinecone/movie-recommender-user-model (32) | None |
msmarco-v1-bm25-allMiniLML6V2 | 8,841,823 | all-minilm-l6-v2 (384) | bm25-k0.9-b0.4 | | | |
quora_all-MiniLM-L6-bm25-100K | 100,000 | https://quoradata.quora.com/First-Quora-Dataset-Release-Question-Pairs | gs://pinecone-datasets-dev/quora_all-MiniLM-L6-bm25 | similar questions | sentence-transformers/msmarco-MiniLM-L6-cos-v5 (384) | naver/splade-cocondenser-ensembledistil |
quora_all-MiniLM-L6-bm25 | 522,931 | https://quoradata.quora.com/First-Quora-Dataset-Release-Question-Pairs | gs://pinecone-datasets-dev/quora_all-MiniLM-L6-bm25 | similar questions | sentence-transformers/msmarco-MiniLM-L6-cos-v5 (384) | naver/splade-cocondenser-ensembledistil |
quora_all-MiniLM-L6-v2_Splade-100K | 100,000 | https://quoradata.quora.com/First-Quora-Dataset-Release-Question-Pairs | gs://pinecone-datasets-dev/quora_all-MiniLM-L6-v2_Splade | similar questions | sentence-transformers/msmarco-MiniLM-L6-cos-v5 (384) | naver/splade-cocondenser-ensembledistil |
quora_all-MiniLM-L6-v2_Splade | 522,931 | https://quoradata.quora.com/First-Quora-Dataset-Release-Question-Pairs | gs://pinecone-datasets-dev/quora_all-MiniLM-L6-v2_Splade | similar questions | sentence-transformers/msmarco-MiniLM-L6-cos-v5 (384) | naver/splade-cocondenser-ensembledistil |
squad-text-embedding-ada-002 | 18,891 | https://huggingface.co/datasets/squad | text-embedding-ada-002 (1536) | None | | |
wikipedia-simple-text-embedding-ada-002-100K | 100,000 | wikipedia | gs://pinecone-datasets-dev/wikipedia-simple-text-embedding-ada-002-100K | multiple | text-embedding-ada-002 (1536) | None |
wikipedia-simple-text-embedding-ada-002 | 283,945 | wikipedia | gs://pinecone-datasets-dev/wikipedia-simple-text-embedding-ada-002 | multiple | text-embedding-ada-002 (1536) | None |
youtube-transcripts-text-embedding-ada-002 | 38,950 | youtube | gs://pinecone-datasets-dev/youtube-transcripts-text-embedding-ada-002 | multiple | text-embedding-ada-002 (1536) | None |