Q+A systems often use a two-step approach: retrieve relevant text chunks and then synthesize them into an answer. There many ways to approach this. For example, we recently discussed the Retriever-Less option (at bottom in the below diagram), highlighting the Anthropic 100k context window model. Metadata filtering is an alternative approach that pre-filters chunks based on a user-defined criteria in a VectorDB using metadata tags prior to semantic search.
I previously built a QA app based on the Lex Fridman podcast. This uses semantic search on Pinecone. However, it failed in cases where a user wanted to retrieve information about a specific episode (e.g., summarize episode 53
) or in cases where a guest had been in multiple times and a user wanted information for a particular episode (e.g., what did Elon say in episode 252
).
In these cases, semantic search will look for the concept episode 53
in the chunks, but instead we simply want to filter the chunks for episode 53
and then perform semantic search to extract those that best summarize the episode. Metadata filtering does this, so long as we 1) we have a metadata filter for episode number and 2) we can extract the value from the query (e.g., 54
or 252
) that we want to extract. The LangChain SelfQueryRetriever
does the latter (see docs), splitting the user input into a semantic query and a metadata filter (for Pinecone or Chroma).
We previously introduced auto-evaluator, an open-source tool for grading LLM question-answer chains. Here, we extend auto-evaluator with a lightweight Streamlit app that can connect to any existing Pinecone index. We add the ability to test metadata filtering using SelfQueryRetriever
as well as some other approaches that we’ve found to be useful, as discussed below.
SelfQueryRetriever
works well in many cases. For example, given this test case:
The query can be nicely broken up into semantic query and metadata filter:
semantic query: "prompt injection"
metadata filter: "webinar_name=agents in production"
But, sometimes the metadata filter is not obvious based on the natural language in the question. For example, my Lex-GPT app used an episode ID tag derived from my initial scrape of the Karpathy transcriptions, e.g., I have “0252”
for episode 252
. This means that the retriever will need to perform this translation step, as shown in the diagram below.