HyperAI

Reciprocal Rank Fusion(RRF)

Reciprocal Rank Fusion (RRF) is an algorithm that evaluates the search scores of multiple previously ranked results to produce a unified result set.It is a method for combining multiple result sets with different relevance metrics into a single result set. RRF does not require tuning, and different relevance metrics do not have to be correlated with each other to obtain high-quality results. The technique was developed in collaboration between the University of Waterloo (Canada) and Google, and according to its authors, "produces better results than any single system, and better than standard re-ranking methods".

RRF Process

The basic principle of RRF is that documents that consistently appear in the top positions across different search strategies are likely to be more relevant and should therefore receive a higher ranking in the merged results.

Here is a simplified breakdown of the RRF process:

  1. Collect ranked search results from multiple simultaneous queries.
  2. Assign a reciprocal rank score to each result in the ranked list. The process of RRF generates a new search score for each match in each result set. For each document in the search results, the algorithm assigns a reciprocal rank score based on its position in the list. This score is calculated as 1/(rank + k), where "rank" is the position of the document in the list and "k" is a constant. Empirical observations show that "k" performs best when set to a small value (such as 60). Note that this "k" value is a constant in the RRF algorithm and is completely different from "k", which adjusts the number of nearest neighbors.
  3. Combined scores. The algorithm adds the reciprocal ranking scores obtained from each search strategy for each document, producing a combined score for each document.
  4. The algorithm ranks the documents based on the combined scores and arranges them accordingly. The resulting list constitutes the fused ranking.

The Reciprocal Rank Fusion (RRF) process can be described using a flowchart:

Figure 1:Reciprocal Rank Fusion (RRF) flow chart. The figure illustrates the steps involved in the RRF ranking process.

Implementing RRF

RRF uses the following formula to determine the ranking score for each document:

score = 0.0
for q in queries:
    if d in result(q):
        score += 1.0 / ( k + rank( result(q), d ) )
return score

# where
# k is a ranking constant
# q is a query in the set of queries
# d is a document in the result set of q
# result(q) is the result set of q
# rank( result(q), d ) is d's rank within the result(q) starting from 1

(Code from Elasticsearch documentation

References

【1】https://safjan.com/implementing-rank-fusion-in-python/

【2】https://www.elastic.co/guide/en/elasticsearch/reference/current/rrf.html