-
Initialize Ratings: Start by giving each model (, , ) an initial rating. A common starting point is 1000 for each.
-
Process Each Pairwise Result:
-
For each comparison result (e.g., Result 1), update the ratings of the two models involved based on the outcome.
-
The Elo rating update formula is:
where:
- is the new rating.
- is the old rating.
- is the K-factor, which determines the maximum possible adjustment per comparison.
- is the score: 1 if the model wins, 0 if the model loses, and 0.5 for a draw.
- is the expected score based on the current ratings of the models.
-
-
Calculate Expected Score:
- The expected score for a model against a model is given by: and similarly,
-
Update Ratings for Each Comparison:
- For each comparison, calculate the expected scores for both models and then update their ratings based on the actual outcome.
-
Rank Models Based on Final Ratings: After processing all pairwise results, the final ratings will indicate the ranking of the models. The model with the highest rating will be ranked first, the second highest will be ranked second, and so on.