Suggestion: display elo ratings of opponents in replays

This seems to demonstrate that the current match-making system completely ignores ELO when selecting competitors, I’d love to see that changed!

To ask @KauffK’s question in another way …

Since it’s been said that wins against opponents more then 400 ELO away will not be of any value, can you change it so that we ONLY match against opponents within 400 ELO either way?

We can confirm that matchmaking has been matching people with significantly different Elos, which is not something we intended. The Elo system itself had a few issues which we have already patched which led to a large number of players having extremely low Elo scores. I fixed Elo last week and we deployed it recently, and I am now going to investigate matchmaking- something I wanted to do for some time.

Matchmaking is intended to match players of similar Elo first, but because Algos don’t change their strategies, and usually dont have enough randomness to lead to a significant difference in winrate between the same two algos rematching, we do not allow ranked rematches between algos. This leads to a host of problems, most prominently top algos having nobody left to play at their level after just a few dozen matches, and forcing them to play against low level players

The problems are as follows:

  • Unfair to top players, who gain nothing on winning and lose Elo on losing
  • Wastes games for low level players, makes it take longer for them to reach their correct Elo
  • Wastes games/server time playing a game with a very high chance of not changing anyones Elo
  • No Elo mobility at high levels

The issue is caused by a design flaw in the matchmaking system, rather than a bug. I am going to pitch the following changes to the team:

  • Never match two algos in a match where the favored players Elo is too high to gain 1 point from winning
  • Allow Elos to rematch after a certain amount of time xor after a significant change in Elo of one of the algos (a 25-50 point change maby) to ensure mobility.

I am going to do additional research into matchmaking and systems where rematches are not ideal as well. Let me know if anyone has ideas. Hopefully, we can come up with a new system within a few days and have it implemented, tested and deployed before the end of next week.

1 Like

You could use a Game Ladder Ranking System, which is similar to Elo but includes a limit as to how high up the leaderboard a player (or algo in this case) can challenge. I’d argue that this would solve the second and third problem, although the last problem may still present itself. I also feel like the first problem would also be taken care of, as losing could only result in a potential drop as large as the challenge distance, while winning reduces the number of algos that could challenge you.

As for the problems mentioned in the Wikipedia article, these should be avoidable, as the challenge system could be made to prevent any one algo from challenging too frequently or too infrequently. Furthermore, while a game ladder ranking system lacks the numerical ranking of the Elo system, one could rank the algos in terms of height, with the highest algo being first place, the second highest second, and so on and so forth.

However, there do exist some downsides with using a game ladder ranking system. The most obvious one is that three or more algos could be locked in a rock-paper-scissors rotation, although as far as I know the Elo ranking system would also be susceptible to this, as long as rematches are occasionally held. Having to convert from the current Elo scores to a game ladder system could also pose a problem, although preserving the ordering of all of the algos might be the easiest and best way to implement it. Determining the sizes of the “rungs” of the game ladder might also be difficult, as making them too large leads to problems two and three, while making them too small leads to problem four.

Nonetheless, I feel like using a game ladder ranking system would be beneficial and offers several benefits not found in the current system, although adding in a cap as to how large the Elo difference can be between two algos in a match would essentially emulate this and, thus, also serve as an improvement to the current system.

I wont be responding to individual suggestions as having a discussion about each of them will be very time consuming. Any ideas that are proposed here like the proposal from @Grimm will be investigated by me and discussed internally by the team.

I try to keep our users informed on our priorities and what we are working on, so i’ll let you all know what direction we are headed when we make a decision.

3 Likes

Thanks @RegularRyan for the insight! I found this particularly interesting:

It makes complete sense why this decision was made, but I can see how it has stagnated the top players. Now, given what you say just a little further on:

I’m encouraged that there will be a solution in the works.
My first instinct is that 25-50 point change might not be enough to justify a rematch? Let’s say a hot, new algo hits the board at 1500 ELO.

Truth_of_Cthaeh sitting at 1950 ELO won’t see it until it wins a bit, but then suddenly has a new, eligible opponent and plays it early on. 1950 beating 1550 won’t result in much gain by Truth_of_Cthaeh for good reason. But assuming the new algo ends up settling in the top 20 with 1700+ elo, had Cthaeh battled it as 1950 -vs- 1700, it would have rightly gained more points for the win. Now given the deterministic behavior, is having Cthaeh gain ELO from playing against it at 1550, 1600, 1650, 1700 really the right solution? That seems like it would result in continuing to push the upper echelon further out of reach for newer algos.

I know one of your goals is to reward strong algos that have been around for a while, but when the stronger Truth_of_Cthaeh (2) finally gets uploaded, is the original (inferior) one too far out of reach for #2 to ever catch up?

Edited to add: I’ve honestly been impressed with your handling of things so far, so I trust the solution you come up with will be a great one. And I’m having a blast participating in this competition, so thanks for all your hard work and responsiveness!

3 Likes

Thanks n-sanders, I try my best! We will take these concerns into account during our matchmaking redesign/fix

Sorry my brain keeps spinning on this …

Maybe having an accumulated elo gain from a given match up would solve the repeat play problem?

If Truth_of_Cthaeh would get +10 elo for beating an algo at 1550, but it would have gotten +12 elo for beating that same algo at 1600, maybe rematches just accumulate elo?

  1. Truth_of_Cthaeh win vs MyAlgo at 1550 elo = +10 elo
  2. Truth_of_Cthaeh win vs MyAlgo at 1600 elo = +2 elo
  3. Truth_of_Cthaeh win vs MyAlgo at 1650 elo = +2 elo
  4. Truth_of_Cthaeh win vs MyAlgo at 1700 elo = +2 elo

This way Truth_of_Cthaeh is still allowed to get it’s “full” +16 elo had they waited to play until 1950 v 1700, but doesn’t get 52 total elo for beating the same algo 4 times in a row.

1 Like

Hey everyone, an update on this.
We have identified 3 quick improvements we will be making very soon:

  • Algos will not play matches where it is possible to gain 0 Elo
    – Self explanitory. These matches are bad for many reasons

  • Matching logic adjustment
    – Currently, we seek a match the ‘most stagnent’ algo who has not been matched in a while, which makes sense at first glance
    – The problem is that top algos naturally play fewer matches for a number of reasons (Far from new, ‘bursted’ algos, fewer algos around their level)
    – These algos will always look for ‘up and coming’ algos and fight them ASAP, and gain less Elo for beating these newcomers before the newcomers have a chance to gain Elo.
    – This system is causing other biases in the system and make certain matches more likely than others
    – For now, we will choose random algos instead of the most stagnent algo

  • Bugfix
    – There is an issue where matches are being made faster than they are being played, due to us making more matches than we were expecting. This is causes a few issues, like many algos being unavailable for matches.
    – This is a major reason top algos are playing against ‘crashbots’ with literally negative Elo, all algos between them in Elo are scheduled for matched.

These changes will be made within the week, ideally. Larger changes will come later.

3 Likes

The largest problem I see with rematches (which has already been mentioned) is the deterministic nature of algos. A possible way to increase the event space would be to slightly modify the game. For example, the map could be slightly different each time like maybe a few tiles are unusable or certain tiles make units move slower/faster, etc. These would obviously be affecting both players (eg the same tiles would be blocked for both players) with the intent not being to drastically alter the game, but give just a slight difference in games so that a rematch gives new information on the quality of an algo.

I tried to think of something that would not affect the core game strategy much, but still change the outcomes (without making success just random).

@Isaac No, please not.
I agree that rematches are not really necessary, but modifying the game is something that seems totally over the top and I feel like it is not attacking any of the problems mentioned because this would just lead to a need of more games.

Think of it this way: Now, because most algos are deterministic, one match between every algo would be sufficient to accurately determine elo. If you modify the game, you would need to match every algo against each other for every variation of the game to have an accurate score.

Before: number_of_matches_needed_for_accurate_elo = number_of_other_algos
With your idea: matches_needed = other_algos * variations_of_game

This would lead to even longer times to get an accurate elo. We are already waiting days until an algo reaches the top algos, which will hopefully be improved with the improvements mentioned by @RegularRyan, but would be increased by your idea.

I also hope that algos below 1500 elo get to play less matches than those above because it is better for every user if they get feedback from their better algos, not to mention those that try to compete at the top.

Definitely not excluding them because matches like this occurrence are valuable experiences.

1 Like

I agree that deterministic works well for this competition, but there is still less elo to gain if you play an algo while it is still “rising” to it’s stable place in the standings. If your algo is always going to beat my algo, then you want to play it when I’m ranked at my highest elo to maximize the elo you gain from it (since we already know the elo earned is dependent on the opponents elo at the time you play them).

That’s where rematches (assuming something like the “accumulated” elo gain is used) can be helpful for accurate elo in the end.

2 Likes

Great that you try to pick those low hanging fruits first. I have a question about your first point about matches where it is possible to gain 0 Elo. Are those where one (or both) player has a negative Elo?

@Janis
If you beat someone with much more Elo than you the formula says “One of these players is at the wrong level” So more points are exchanged. If you beat someone with less Elo than you the system says “Thats what I expected, their scores look correct” and makes a smaller adjustment. If you beat someone with a much lower score, I think its a 400 point difference, the adjustment becomes so small 0 points are exchanged

Thanks! Got it. Maybe an offer for a reducing “unintended matches” is to remove all algorithms from match queue which ELO is below 0.
We all start at 1500, if I get it right it is at least 10 matches to lose in a row to get below 0. By that time person probably should begin with another strategy seeing that current is not working well.

Maybe this kind of matches are not that many, max 5% of all, but if it is not too difficult it is an idea.
(I got 5% by roughly estimating my matches with negative ELO algorithms).

These changes look great, but I think there should be a different approach as to how each bot gets selected for a match.

Currently, I’m noticing that sometimes newly uploaded bots will play a quick session of 5 games before going into “1 or two an hour, if lucky” mode. It doesn’t happen all the time, and I’m not sure if it’s intended, but I think this “placement match” idea should be expanded on.

For instance, a newly uploaded bot would differ from older bots in a couple of key ways.

Firstly, their elo changes are more volatile. Winning the first several matches should give it a significant boost, compared to an older bot of the same elo that has an established record of “I belong here.” This decays over the course of several matches, making elo changes a little less susceptible to a win or loss, apart from the difference in elo in the match.

Secondly, new bots, with volatile elos, would be the ones initiating the matches. As the challenger, it’s primarily looking to challenge a bot with higher elo (bad bots would start to challenge down as it becomes apparent they’re over-elo). The amount of challenges, as with how volatile elo changes are, decays over time, and less games are played, particularly as the challenger.

By comparison, older bots play very few matches as the challenger and don’t move in elo as much. They still play games regularly as the challenged, but overall less games than a new bot would play.

In a broader sense, this would fix some of the design issues laid out:

  • Lower level players reach their elo quicker, as they had more to lose in the beginning. It doesn’t take too many matches for them to clearly be at a lower level, and priority for their matches can be tuned down.
  • Server time playing games with little consequence is avoided, as most matches have a challenger with something to gain (more matches to the newer bots), even if the challenged bot has little to lose (fairly stable elo)
  • High levels gain mobility, particularly for newer bots that show dominance over older ones
  • Fairer for top players. Although older bots don’t have much to gain in a given match, this is actually desired. Instead of the top bot continuously gaining elo from old matches until its elo is too high for new bots to reach, top players gain the advantage of putting down a newer bot that has more susceptible elo, helping to reaffirm their top status. On the other hand, newer bots upsetting older ones would be able to surpass the older bot’s elo, without the older bot losing too much elo. This keeps the “top bar” from creeping too high, while still enabling newer bots to reach that top bar.

Downsides:

  • Top players still have little to “gain” in a match, but as detailed above, I don’t think that is desirable.
  • Bots with “unlucky” early matches or even “very lucky” early matches can get placed out of their actual elo, and this can take a long time to adjust, particularly since the bots play less matches over time.

Other things to note are that new bots uploaded by high ranked players don’t have to start at 1500 elo. Or if they do, they don’t need to start with the same elo gains per match. Note that reducing elo gains from a player who historically has uploaded low elo bots would give them an undesirable disadvantage, so this would be done specifically to help the servers place bots that are probably high elo into high elo in fewer games.

Edit: Regarding rematches, this helps reduce their necessity as well. Because the bots in a match don’t have the same amount of elo to gain/lose, it makes the deterministic match significantly less dependent on what elo the challenger was when they played their one match against a given bot. Although you would still want high gains for higher differences in elo, as well as higher losses for significant differences the other way, it is more a factor of how volatile the challenger’s elo is, as most matches would challenge up a fairly consistent amount.

I also seemed to have neglected the factor by which how “volatile” an elo is changes. I wouldn’t know for sure how to implement this, as making it based solely on the number of games makes the higher elos less reachable (assuming each bot starts from 1500). Another idea, besides not starting each bot the same, would be to stabilize the elo as win ratio approaches 50/50. As long as a bot’s on a consistent win streak (or lose streak, for bad bots), the bot’s elo remains volatile.

1 Like

This is great input. Firstly, you’re right, when you first upload an algo it gets prioritized for 5 matches so you can get immediate feedback. However, its not too useful right now because you have to start back at 1500 ELO and may get matched with algos that have extremely low or high ELOs.

We were thinking of something similar to your idea, having new algos you upload start with the ELO of the highest ELO algo you’ve had.

There are some issues with this, most obvious is simply ELO inflation. There are also probably a few ways to game the system, keep reuploading the same bot until you get matches with algos you happen to counter for example. But in the long run ELO should even out and it would be easier to assess modifications to your algo because it would play against more competitive algos close to your previous algos ELO.

Having your first few matches have exponentially higher ELO changes could also help. Additionally, I know some games have it so that win or lose streaks exponentially increase how much your ELO changes. For example if a user “smurfs” (makes a new account even though they are an experienced player), they will win many games in a row since they will be matched with other new players at first. With exponential ELO changes for win streaks, it would only take a few matches for the system to realize the player is a “smurf” and will set the account to have a high ELO even though its a new account.

In general though I believe the fundamental problem with our current match making system is it prioritizes having a fair distribution of quantity of matches over having a close ELO high quality match. Essentially we prioritize minimizing the time between matches a bit too much. When your algo is looking for a match the system will choose an algo that hasn’t played a match in awhile over an algo close to your ELO. Though we also have some unintended behavior in our code that once fixed might solve some of these issues too.

We are still discussing what the best approaches are and what to prioritize so your feedback is welcome!

3 Likes

Thanks, I guess I should clarify that’s not my idea so much as my observations from the Halite competition, which had a very successful elo system (although, the game map was randomized so rematches were necessary and not avoided, which made the above system easier to implement).

Gaming the system certain wants to be avoided, and I can attest the above system has minor gimmicks. One of those pitfalls, “unlucky” starts, could be avoided by just taking down the unlucky bot and reuploading it until it got a “good” start. However, this just helped expedite getting the bot to its appropriate elo and getting feedback on how it’s doing, not necessarily changing its final outcome.

Elo inflation can be avoided by starting at a “fraction” of where the previous left off, or by multiplying the initial elo gains by a factor of the distance from 1500 and starting each bot at 1500. This makes each bot have to “prove” it belongs where its predecessor is, or higher, while still having the means of getting there with larger initial gains (or losses).

Increasing the volatility by win streaks/lose streaks is not something I’d recommend. Volatile elo is something that should be maintained by streaks, not gained. Otherwise inflation is inevitable by random circumstance if the gains don’t significantly taper off at the top (which reintroduces mobility problems at the top).

I agree that the primary concern should be with “fair distribution of matches” that’s introducing a lot of issues, but when that’s fixed there will still be room for improvement on the turnaround time of getting an algorithm to an appropriate elo.

Any updates on first improvement?

I mentioned it in another thread, but we will likely not be deploying these changes until next week. We are currently prioritizing features and fixes relating to our UMich event this weekend for business reasons.

1 Like