Facial Similarity Search

Open In ColabOpen In Colab Open nbviewerOpen nbviewer Open githubOpen github

In this notebook, we will demonstrate how to use Pinecone to build an image-based vector search application to discover people with similar facial features. We will:

  1. Extract faces from a celebrity image dataset
  2. Convert the faces to embeddings and store them in a Pinecone index (alongside metadata related to the celebrities)
  3. Query the Pinecone index with an image of a person and find the most similar celebrities

Install Dependencies

!pip install datasets pinecone-client[grpc] facenet-pytorch requests Pillow

Load Dataset

We will use a dataset containing photos of ~115K most popular people on The Movie Database (TMDB). This dataset can be loaded from Huggingface as follows:

from datasets import load_dataset

# load the dataset
celeb_faces = load_dataset("ashraq/tmdb-people-image", split="train")
celeb_faces
Dataset({
    features: ['adult', 'also_known_as', 'biography', 'birthday', 'deathday', 'gender', 'homepage', 'id', 'imdb_id', 'known_for_department', 'name', 'place_of_birth', 'popularity', 'profile_path', 'image'],
    num_rows: 116404
})

We have got few metadata about the people and their image in the dataset. Let's take a look:

celeb = celeb_faces[10]
celeb
{'adult': False,
 'also_known_as': "['Thomas Stanley Holland', 'Том Холланд', 'トム・ホランド', '톰 홀랜드', 'توم هولاند', 'ทอม ฮอลแลนด์', '汤姆·赫兰德', 'Τομ Χόλαντ', 'Том Голланд', '湯姆·霍蘭德', 'טום הולנד', 'תומאס סטנלי הולנד', 'Nhện Đệ Tam', 'ტომ ჰოლანდი']",
 'biography': 'Thomas "Tom" Stanley Holland is an English actor and dancer. He is best known for playing Peter Parker / Spider-Man in the Marvel Cinematic Universe and has appeared as the character in six films: Captain America: Civil War (2016), Spider-Man: Homecoming (2017), Avengers: Infinity War (2018), Avengers: Endgame (2019), Spider-Man: Far From Home (2019), and Spider-Man: No Way Home (2021). He is also known for playing the title role in Billy Elliot the Musical at the Victoria Palace Theatre, London, as well as for starring in the 2012 film The Impossible.',
 'birthday': '1996-06-01',
 'deathday': None,
 'gender': 2,
 'homepage': None,
 'id': 1136406,
 'imdb_id': 'nm4043618',
 'known_for_department': 'Acting',
 'name': 'Tom Holland',
 'place_of_birth': 'Surrey, England, UK',
 'popularity': 104.302,
 'profile_path': 'bBRlrpJm9XkNSg0YT5LCaxqoFMX.jpg',
 'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=421x632 at 0x7FA4FC04CB50>}
celeb["image"].resize((200, 300))

pngpng

We do not need all these metadata fields. So we will remove the ones we do not need and convert the rest into a pandas dataframe.

# remove metadata fields not needed, convert into a pandas dataframe
metadata = celeb_faces.remove_columns(['adult', 'also_known_as', 'biography', 'deathday', 'gender', 'homepage', 'id', 'imdb_id', 'known_for_department', 'image']).to_pandas()
# replace any empty fields with None
metadata = metadata.fillna("None")

Embedding Model

We will use two models: one for extracting faces and another for generating vector embeddings of the face. We're focusing on faces only because using full images would introduce too much noise and result in poor results.

For face extraction, we will use MTCNN, which is a popular choice due to its ability to accurately detect and align faces in images despite variations in pose and appearance. We can use a Pytorch implementation of MTCNN with the facenet-pytorch package. Since the images in our dataset are already in PIL format, we can directly test the MTCNN model, which expects PIL image objects as inputs, as shown below:

from facenet_pytorch import MTCNN

# initialize the MTCNN model
mtcnn = MTCNN()
# create a copy of the face
img = celeb["image"].copy()
# detect face and get coordinates of the face with probability
boxes, prob = mtcnn.detect(img)
boxes, prob
(array([[ 91.4824  , 112.335335, 316.80338 , 409.37723 ]], dtype=float32),
 array([0.9999924], dtype=float32))

The detect method in MTCNN gives us the coordinates of the face and how confident it was in detecting the face, in this case, with 99% accuracy. Let's draw a rectangle on the image using these coordinates to see if it correctly detected the face.

from PIL import Image, ImageDraw

# draw a rectangle on the image using coordinates returned by the MTCNN model
draw = ImageDraw.Draw(img)
draw.rectangle(boxes.reshape((2,2)), width=3)
# resize the image to display a smaller size
img.resize((200, 290))

pngpng

As we can see, the model has successfully identified the face. To extract the face, we can crop the image to only include the area within the rectangle, using either opencv or another package. Alternatively, the facenet-pytorch package has a function that does this for us and returns the result as Pytorch tensors that can be used as input for the embedding model directly. This can be done as follows:

# pass the image or batch of images directly through mtcnn model
face = mtcnn(img)
face.shape
torch.Size([3, 160, 160])

To generate embeddings, we will use VGGFace2, which is a deep learning model for facial recognition that was trained on the VGGFace2 dataset, which includes more than 3 million images of over 9000 people. The model can be loaded and used as follows:

from facenet_pytorch import InceptionResnetV1
import torch

# initialize VGGFace2 model
resnet = InceptionResnetV1(pretrained="vggface2").eval()
# generate embedding for the face extracted using mtcnn above
embedding = resnet(torch.stack([face]))
embedding.shape
torch.Size([1, 512])

We can now generated vector embedding for the face. Let's write a pipeline to easy do all of this in batches.

import numpy as np


class FacenetEmbedder:
    def __init__(self):
        # set device to use GPU if available
        self.device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
        # initialize MTCNN model
        self.mtcnn = MTCNN(device=self.device)
        # initialize VGGFace2 model
        self.resnet = InceptionResnetV1(pretrained='vggface2', device=self.device).eval()

    def detect_face(self, batch):
        # get coordinates of the face
        faces = self.mtcnn.detect(batch)
        return faces

    def encode(self, batch):
        # pass the batch of images directly through mtcnn model
        face_batch = self.mtcnn(batch)
        # remove any images that does not contain a face
        face_batch = [i for i in face_batch if i is not None]
        # concatenate face batch to form a single tensor
        aligned = torch.stack(face_batch)
        # if using gpu move the input batch to gpu
        if self.device.type == "cuda": 
            aligned = aligned.to(self.device)
        # generate embedding
        embeddings = self.resnet(aligned).detach().cpu()
        return embeddings.tolist()
# initialize the embedding pipeline
facenet = FacenetEmbedder()
# test the pipeline using a small image batch
batch = celeb_faces[10:20]["image"]
len(facenet.encode(batch))
10

We can now simply call the encode method in the FacenetEmbedder with a batch of PIL images and it would extract the faces and generate embedding for us. Keep in mind that batch encoding only works if all the images in the batch have the same shape. We can use the following function to reshape a batch of PIL images to ensure it always works.

def reshape(batch):
    batch = [image.convert("RGB").resize((421, 632)) for image in batch]
    return batch

Initialize Pinecone Index

Now we need to set up the Pinecone index, which stores vector representations of our images that can be retrieved using the embedding of another image (called the query vector). Before we can do this, we have to establish a connection to Pinecone using an API key. You can find your environment in the Pinecone console under API Keys. This connection is initialized as follows:

import pinecone

# connect to pinecone environment
pinecone.init(
    api_key="YOUR_API_KEY",
    environment="YOUR_ENVIRONMENT"
)

Now, we can create our vector index and name it "tmdb-people" (although you can choose any name you like). We specify the metric type as cosine and the dimension as 512, as these are the vector space and dimensionality of the vectors produced by the embedding model we use.

index_name = "tmdb-people"

# check if the tmdb-people index exists
if index_name not in pinecone.list_indexes():
    # create the index if it does not exist
    pinecone.create_index(
        index_name,
        dimension=512,
        metric="cosine"
    )

# connect to tmdb-people index we created
index = pinecone.GRPCIndex(index_name)

Generate Embeddings and Upsert

Next, we need to generate embeddings for the celebrity faces and upload them into the Pinecone index. To do this efficiently, we will process them in batches and upload the resulting embeddings to the Pinecone index. For each celebrity in the dataset, we need to provide Pinecone with a unique id, the corresponding embedding, and metadata. The metadata is a collection of information related to the celebrities, including their name, profile image url, date of birth, etc.

from tqdm.auto import tqdm

# we will use batches of 64
batch_size = 64

for i in tqdm(range(0, len(celeb_faces), batch_size)):
    # find end of batch
    i_end = min(i+batch_size, len(celeb_faces))
    # extract batch
    batch = celeb_faces[i:i_end]["image"]
    # reshape the images to ensure they all have same shape
    batch = reshape(batch)
    # generate embeddings for batch
    emb = facenet.encode(batch)
    # create unique IDs
    ids = [f"{idx}" for idx in range(i, i_end)]
    # add metadata
    meta = metadata[i:i_end].to_dict(orient="records")
    # add all to upsert list
    to_upsert = list(zip(ids, emb, meta))
    # upsert/insert these records to pinecone
    _ = index.upsert(vectors=to_upsert)

# # check that we have all vectors in index
index.describe_index_stats()

We have successfully added everything we need to the Pinecone index.

Find Similar Celebrities

Now we can query the Pinecone index with an embedding of a face and instantly get the celebrities that are most similar. First, let's write a helper functions to query pinecone and display the results.

from IPython.core.display import HTML


def display_result(metadata):
    figures = []
    for m in metadata:
        figures.append(f'''
            <figure style="margin: 5px !important;">
                <img src="https://image.tmdb.org/t/p/h632/{m["profile_path"]}" style="width: 190px; height: 240px; border-radius: 10px;" >
                <figcaption>{m["name"]}</figcaption>
            </figure>
        ''')
    return HTML(data=f'''
        <div style="display: flex; flex-flow: row wrap; text-align: center;">
        {''.join(figures)}
        </div>
    ''')
def find_similar_faces(face, top_k=10):
    # pass the image through the embedding pipeline
    emb = facenet.encode([face])
    # query pinecone with the face embedding
    result = index.query(emb[0], top_k=6, include_metadata=True)
    # extract metadata from the search results and display results 
    r = [x["metadata"] for x in result["matches"]]
    return display_result(r)

Let's run some test queries using celebrity images from the dataset to find other celebrities who look alike.

celeb = celeb_faces[40]["image"]
celeb.resize((190,240))

pngpng

find_similar_faces(celeb)
Claudia Harrison
Sonya Walger
Bettina Mittendorfer
Julianne Nicholson
Aglaia Szyszkowitz
Hannah Barefoot

The search result looks good as we can definately see some celebrities with similar facial features. Let's run more queries.

celeb = celeb_faces[35]["image"]
find_similar_faces(celeb)
Chris Hemsworth
Liam Hemsworth
Ed Hendrik
Chris Reid
Kenny Doughty
Christopher Russell
celeb = celeb_faces[1]["image"]
find_similar_faces(celeb)
Seung Ha
Shara Lin
Yura
Napasorn Weerayuttwilai
Go Joon-hee
An Danwei
celeb = celeb_faces[12]["image"]
find_similar_faces(celeb)
Jason Statham
Vladimir Raiman
Brendan Kelly
Scott C. Brown
Michael Chiklis
Huw Garmon
celeb = celeb_faces[17]["image"]
find_similar_faces(celeb)
Joey King
Elva Trill
Rosalind Halstead
Svetlana Svetlichnaya
Megan Parkinson
Clara Ponsot
celeb = celeb_faces[29]["image"]
find_similar_faces(celeb)
Luke Grimes
Philip Ettinger
Nick Thune
Vicente Alves do Ó
Thomas McDonell
Irhad Mutic
celeb = celeb_faces[64]["image"]
find_similar_faces(celeb)
Jeffrey Dean Morgan
Pompeu José
Özcan Varaylı
Selahattin Taşdöğen
Vagelis Rokos
Darrell D'Silva

The search results look excellent. To further test our system, let's try using images that are not in the dataset. The following function, get_image, can be utilized to load an image as a PIL object using a URL:

from PIL import Image
import requests

def get_image(url):
  img = Image.open(requests.get(url, stream=True).raw)
  return img
  
url = "https://live.staticflickr.com/7442/9509564504_21d2dc42e1_z.jpg"
# load the image as PIL object from url
celeb = get_image(url)
celeb.resize((190,240))

pngpng

find_similar_faces(celeb)
Jennifer Lawrence
Aislyn Watson
Carrie Underwood
Carissa Capobianco
Kimberley Klaver
Esti Ginzburg
url = "https://live.staticflickr.com/3563/3304692615_bc67db2606_z.jpg"
# load the image as PIL object from url
celeb = get_image(url)
celeb.resize((190,240))

pngpng

find_similar_faces(celeb)
Brad Pitt
Peter M. Lenkov
Michał Lewandowski
Luke Arnold
David Berry
Marco Quaglia

As we can see, the search result correctly identifies the celebrity in the picture as the top match and also finds other celebrities with similar facial features.

Example Application

Are you curious if you share a resemblance with a famous celebrity? Try this demo app, which has been built based on this notebook, to find out.