Auto-Evaluation of Metadata Filtering

Introduction

Q+A systems often use a two-step approach: retrieve relevant text chunks and then synthesize them into an answer. There many ways to approach this. For example, we recently discussed the Retriever-Less option (at bottom in the below diagram), highlighting the Anthropic 100k context window model. Metadata filtering is an alternative approach that pre-filters chunks based on a user-defined criteria in a VectorDB using metadata tags prior to semantic search.

Untitled

Motivation

I previously built a QA app based on the Lex Fridman podcast. This uses semantic search on Pinecone. However, it failed in cases where a user wanted to retrieve information about a specific episode (e.g., summarize episode 53) or in cases where a guest had been in multiple times and a user wanted information for a particular episode (e.g., what did Elon say in episode 252).

In these cases, semantic search will look for the concept episode 53 in the chunks, but instead we simply want to filter the chunks for episode 53 and then perform semantic search to extract those that best summarize the episode. Metadata filtering does this, so long as we 1) we have a metadata filter for episode number and 2) we can extract the value from the query (e.g., 54 or 252) that we want to extract. The LangChain SelfQueryRetriever does the latter (see docs), splitting the user input into a semantic query and a metadata filter (for Pinecone or Chroma).

Evaluation

We previously introduced auto-evaluator, an open-source tool for grading LLM question-answer chains. Here, we extend auto-evaluator with a lightweight Streamlit app that can connect to any existing Pinecone index. We add the ability to test metadata filtering using SelfQueryRetriever as well as some other approaches that we’ve found to be useful, as discussed below.

ret_trim.mov

Testing

SelfQueryRetriever works well in many cases. For example, given this test case:

Untitled

The query can be nicely broken up into semantic query and metadata filter:

semantic query: "prompt injection"
metadata filter: "webinar_name=agents in production"

But, sometimes the metadata filter is not obvious based on the natural language in the question. For example, my Lex-GPT app used an episode ID tag derived from my initial scrape of the Karpathy transcriptions, e.g., I have “0252” for episode 252. This means that the retriever will need to perform this translation step, as shown in the diagram below.