Click to play video
Score normalization can be messy. Min-max normalization doesn't work well with major outliers, or if the data you're combining has different kinds of distributions.
What if we could skip scores entirely? That's what Reciprocal Rank Fusion (RRF) does!
def rrf_score(rank, k=60):
return 1 / (k + rank)
It uses rankings instead of scores. A ranking is just a position in a list, so instead of:
We use numbered ranks:
Say we run a query through keyword and semantic search, and get these score-based results:
We can use the rrf_score function to convert these into rrf rankings:
1 / (60 + 1) = 0.01641 / (60 + 2) = 0.01611 / (60 + 3) = 0.01591 / (60 + 1) = 0.01641 / (60 + 2) = 0.01611 / (60 + 3) = 0.0159To create a hybrid ranking, we can just sum the RRF scores for each document across both result sets:
0.0164 + 0.0161 = 0.03250.0164 + 0.0159 = 0.03230.01610.0159The k parameter (a constant) controls how much more weight we give to higher-ranked results vs. lower-ranked ones.
k values like 20: Gives more weight to top-ranked results, creating a steep drop-off in scores.k values like 100: Creates a more gradual decline, giving lower-ranked results more influence.A good "default" value for k is around 60; this tends to work well across many datasets and queries.
Build hybrid search using Reciprocal Rank Fusion.
1. Paddington
RRF Score: 0.033
BM25 Rank: 1, Semantic Rank: 1
Deep in the rainforests of Peru, a young bear lives peacefully with his Aunt Lucy and Uncle Pastuzo,...
2. The Indian in the Cupboard
RRF Score: 0.031
BM25 Rank: 2, Semantic Rank: 8
On his ninth birthday, Omri receives an old cupboard from his brother Gillon (Vincent Kartheiser) an...
Run and submit the CLI tests.