So turns out finally updating my season 5 algo’s sim to use season 6 Destructor stats gave me an algo that’s been beaten the 3 Demorfs it’s played so far. Right now just a single one is running around in the wild.
Are you wondering what happens if you rack up several losses in a row?
(Are you asking for someone to spam you with losses to see how many it takes to drop you below 2500?)
Agree that facing distant players is not the root issue. With each match played, glicko moves you closer to some hypothetical ‘True’ rating, and gets closer the more data it has access to. With more matches played, you get a few more points because you earn and deserve them, proving yourself against weaker algos still shows the system that your ‘True’ rating is higher than your current one.
The definition of the ‘True’ rating is the rating a player would converge on if all players played all other players an infinite number of times.
This is definitely a problem that still exists and is the main reason we give the leaderboard a few days to stabilize before running a season finale. Hopefully having it be ‘off’ during the season is tolerable. I’ll add a low priority ticket to investigate improvements to this system but it probably wont be addressed for a while.
As always, everyone should feel free to voice their thoughts on the matter and we can adjust priorities if needed.
Pretty late today, but im going to investigate this later this week.
https://bcverdict.github.io/?id=121114
I think this bug might also be in the tool from @bcverdict, since all the rating points from the chart are calculated by him (since this information is not available in the match history) and this tool was not updated for quite some time.
You’re right, the large jump is now shown on the most recent match again (which is a different match than last time). It could still possibly be an indication that the rating is higher than it should be though.
@Demorf until now I won 5 times against your top algo and it’s now about 100 points lower than before. I don’t think more uploads are necessary to prove the point.
Good Job. I removed my older algos.
Now lets see if YOU get to escape velocity and separate from the rest …
I noticed something else … just before you pinned me down. Some of the other top algos also had wins agains me, but their rating was 300 less then mine … so the penalty (and probably their reward) seams to be greatly reduced, less then 30 points. So this effect helps excelerate the gap.
A larger rating difference leads to a higher change in rating after a match is resolved. A more likely explination is that as your algo plays more matches, the algorithm becomes more and more certain that your rating is ‘correct’ or close to correct and reduces the amount that it fluctuates.
This should be clear through example. If an algo has played 1 game where it beat a 2100 algo, its probably pretty good so it gets alot of points. If an algo has played 10,000 games and is still 1500 rating, a single win against a 2100 algo is probably an outlier, and the algorithm is more stingy giving out points
I ran some queries and verified ~220,000 games ran in the past 45 days
There were 20 games in the past month where over 200 points changed hands.
18 of them involved at least one algo playing their first match of terminal
There were 679 games where over 100 points changed hands
533 involved at least one algo playing their first match
All of these games had at least one algo playing their first 5 matches
The reason some games give much more points is if there is a high amount of uncertainty about both algos playing, so it is more generous with points. This helps algos move where they are supposed to be much faster early on.
I havent looked at Felix’s tool, but it is possible that a brand new algo that beats a 2057 algo could make such a jump. (Edit: Just checked, and no algo with over 1800 rating ever loses over 100 points at once, so looks like a bug in Felix’s thing)
Looking at how player ratings have evolved since this change, it seems that the ratings of the top algorithms have become quite inflated since before the change. For example, my algo, which hasn’t really changed from season to season, has the following ratings:
season | algo | elo
6 | spinach_54 | 2788
5 | spinach_53 | 2273
4 | spinach_48 | 2245
I would guess that this +500 elo jump is probably due to the wider matchmaking as well as the cap on rating loss from losing to an algorithm >400 points away. Maybe it is worth testing out whether removing the rating loss cap will reduce this rating inflation so that ratings won’t keep inflating as time goes on.
The cap would not automatically add system-wide inflation, it is a symmetrical effect: You will lose slightly less than expected, and the winner will also gain slightly less than expected.
If I had to guess, I would say that new algos entering the system could be ‘feeding’ more points to top algos before being deleted, since they can now match with them much sooner. This would cause system wide inflation, but the system can be pretty complex and it’s hard to intuit whats going on sometimes.
If we want to ensure everyone enters the final week with a fair shot, I have a suggestion. I’ll make a poll if it seems reasonable to the top players in this thread.
There are a ton of factors that impact algo ratings in unintuitive ways: The meta shifting around through its life, an algo that it performs well against being uploaded many times, and it getting stranded with very high ratings.
The only way to remove these anomalies completely and guarentee a perfectly fair outcome for all players is to reset rating (and matchmaking history to allow rematches) for all algos at the end of the season, when we prevent further algo submissions during the ‘settling’ period. This ensures everyone starts on even footing, regardless of how long their algo has been uploaded and how many people uploaded algos that they perform well against.
I have always been in favor of this solution, but it was vetoed by the team due to concerns that players would feel that they ‘lost’ the rating they had earned throughout the season, or that they would feel cheated if their new rating was significantly lower than their pre-reset rating.
Also, maybe I’m misunderstanding how the cap works, but doesn’t it mean that both parties would have their score change by more than expected? So that high-rated algos will be able to still gain points from opponents who they should beat pretty much every time?
I’m not sure I like the elo reset idea, since as a result the complete season would in fact basically have no use at all (only the end would matter). Further, I believe @Felix did prove do be able to catch up on elo rating very quickly (before he removed his algo).
Of course I might be biased, since I benefit from not resetting, so even if the majority of players are in favor of resetting elo, I wouldn’t really have a problem with it.
I am definitely for the reset. It solves all fairness issues with repeated uploading of similar algos in a very short time. Without the reset, it’s theoretically possible to target a top algo and reduce it’s rating drastically, so that it’s no longer in the top 10 and if that algo had already played with most algos, then there is no change to get up again.
One downside could be, that the time when nobody can upload algos needs to be longer since more matches have to be played to make it fair.
@IWannaWin Yes, a couple days ago I uploaded my algo and it reached first place within 4 hours. Getting up is not the issue.
Good discussion! I can see valid points on both sides.
I guess a lot of it comes down to how the global competition fits in these days. At one point, C1 wanted to incentivize participation throughout the season. In some ways the auto-competitions do this, but there is still no actual benefit to doing well in them right now.
If the season finals are going to be limited to the top 10 from the global leaderboard and there’s a full reset at the end, I agree with @IWannaWin that there is no practical point left to participating in the complete season. This rewards behavior like @Felix’s (I’m not trying to be accusational in this observation, by the way!) - only letting their algo compete on the global stage for very short periods of time before they remove it again.
I’d like to continue this conversation with the benefit of a few more weeks of evidence. Since my last post one month ago, my algo has gone from 2788 elo up to 3010 elo. I think it is pretty clear that in practice the elo ratings have become unstable.
And, this is what we should expect in theory from a ratings cap like the one implemented. First, lets look at the numbers without a cap applied and then with a cap applied.
If you consider a game between algos a1 and a2 with ratings of 2800 and 2000 respectively, then according to the elo model, a1 should win ~99% of the time. The number of points gained/lost by a1 is proportional to the difference between the outcome of the game and the expected outcome of the game which is points_gained = 0.99*(1-0.99) + 0.01*(0 - 0.99) = 0. In other words, on average the rating will not change when playing lower rated opponents. Now when the cap is applied, our model now says that a1 should win 91% of the time (but a1 does in practice still win 99% of the time), so points_gained_with_cap = 0.99*(1-0.91) + 0.01*(0- 0.91) = 0.08. In other words, each time a match is played between a high ranked algo and a low ranked algo with an elo difference greater than the cap, the rating of the high ranked algo will inflate and the low ranked algo will deflate.
So in other words, even though the amount of points exchanged in every game is symmetric, the amount of points exchanged for wins vs losses is not correct. On average the high-ranked algos will be over-rewarded for wins and under punished for losses.
I agree with your assessment max. I am considering this proposal:
S6 Finale
Reset all ratings and match history for all algos when the transition-period begins. Significantly increase the duration of the transition period. Remove the rating gain/loss cap in a small update sometime this week.
This experiment was polled with 17% against and 50% in favor, so we were going to run this experiment next season anyway. Bumping it up to now solves the current rating problem. The main downside of the reset is losing the historical leaderboard data, but this data is a bit awkward anyway due to this mistake.
Going forward
Permanently remove rating gain/loss caps, but maintain the ability for stranded algos to play against significantly lower-rated algos