Player ratings, near Season End

I don’t see any issue if there is a large gap in ratings so long as a high rated algo will continue to find opponents. To my understanding, the only reason Demorf’s isn’t dropping is because it isn’t losing. If it starts losing it should drop, whereas in the old system it would stop getting matches and be unable to drop.

2 Likes

It is not so black and white. Demorf_v6-1-2 is losing … just 5% time.
But most of the “extra matches” are vs really random players … some of them really low, so even when losing from them the penalty is probably reduced.

The end result is my algo keeps climbing in rating and this is even worse then before, where you just kept your high rating.

I would love to see some ones rating on the leaderboard to be reproducible, by uploading the same algo.

I know there is probably not a simple thing to balance, but we have good setup now for testing it.

That 234 rating jump from beating a 2057 rated algo at the end seems odd.

The change that was made doesn’t address the problem of rapidly iterated algos pushing the algos that beat them up and the ones that lose down. This will still occur, and perhaps more so as the algos on either end are no longer “out of reach” of this phenomena. The changed addressed lack of availability of matches, not large gaps in the ratings. Yes, you will continue to rise until a challenger appears. I think if Felix does upload an algo to beat you it will fall quickly. That last match where you jumped 234 rating does seem like a bug though.

1 Like

So turns out finally updating my season 5 algo’s sim to use season 6 Destructor stats gave me an algo that’s been beaten the 3 Demorfs it’s played so far. Right now just a single one is running around in the wild.

Are you wondering what happens if you rack up several losses in a row?
(Are you asking for someone to spam you with losses to see how many it takes to drop you below 2500?)

Agree that facing distant players is not the root issue. With each match played, glicko moves you closer to some hypothetical ‘True’ rating, and gets closer the more data it has access to. With more matches played, you get a few more points because you earn and deserve them, proving yourself against weaker algos still shows the system that your ‘True’ rating is higher than your current one.

The definition of the ‘True’ rating is the rating a player would converge on if all players played all other players an infinite number of times.

This is definitely a problem that still exists and is the main reason we give the leaderboard a few days to stabilize before running a season finale. Hopefully having it be ‘off’ during the season is tolerable. I’ll add a low priority ticket to investigate improvements to this system but it probably wont be addressed for a while.

As always, everyone should feel free to voice their thoughts on the matter and we can adjust priorities if needed.

Pretty late today, but im going to investigate this later this week.

https://bcverdict.github.io/?id=121114
I think this bug might also be in the tool from @bcverdict, since all the rating points from the chart are calculated by him (since this information is not available in the match history) and this tool was not updated for quite some time.

3 Likes

You’re right, the large jump is now shown on the most recent match again (which is a different match than last time). It could still possibly be an indication that the rating is higher than it should be though.

image

@Demorf until now I won 5 times against your top algo and it’s now about 100 points lower than before. I don’t think more uploads are necessary to prove the point.

1 Like

Good Job. I removed my older algos.
Now lets see if YOU get to escape velocity and separate from the rest …

I noticed something else … just before you pinned me down. Some of the other top algos also had wins agains me, but their rating was 300 less then mine … so the penalty (and probably their reward) seams to be greatly reduced, less then 30 points. So this effect helps excelerate the gap.

A larger rating difference leads to a higher change in rating after a match is resolved. A more likely explination is that as your algo plays more matches, the algorithm becomes more and more certain that your rating is ‘correct’ or close to correct and reduces the amount that it fluctuates.

This should be clear through example. If an algo has played 1 game where it beat a 2100 algo, its probably pretty good so it gets alot of points. If an algo has played 10,000 games and is still 1500 rating, a single win against a 2100 algo is probably an outlier, and the algorithm is more stingy giving out points

4 Likes

I ran some queries and verified ~220,000 games ran in the past 45 days

  • There were 20 games in the past month where over 200 points changed hands.
  • 18 of them involved at least one algo playing their first match of terminal
  • There were 679 games where over 100 points changed hands
  • 533 involved at least one algo playing their first match
  • All of these games had at least one algo playing their first 5 matches

The reason some games give much more points is if there is a high amount of uncertainty about both algos playing, so it is more generous with points. This helps algos move where they are supposed to be much faster early on.

I havent looked at Felix’s tool, but it is possible that a brand new algo that beats a 2057 algo could make such a jump. (Edit: Just checked, and no algo with over 1800 rating ever loses over 100 points at once, so looks like a bug in Felix’s thing)

TLDR: everything seems good

3 Likes

Looking at how player ratings have evolved since this change, it seems that the ratings of the top algorithms have become quite inflated since before the change. For example, my algo, which hasn’t really changed from season to season, has the following ratings:

season  |  algo        |  elo
6       |  spinach_54  |  2788
5       |  spinach_53  |  2273
4       |  spinach_48  |  2245

I would guess that this +500 elo jump is probably due to the wider matchmaking as well as the cap on rating loss from losing to an algorithm >400 points away. Maybe it is worth testing out whether removing the rating loss cap will reduce this rating inflation so that ratings won’t keep inflating as time goes on.

The cap would not automatically add system-wide inflation, it is a symmetrical effect: You will lose slightly less than expected, and the winner will also gain slightly less than expected.

If I had to guess, I would say that new algos entering the system could be ‘feeding’ more points to top algos before being deleted, since they can now match with them much sooner. This would cause system wide inflation, but the system can be pretty complex and it’s hard to intuit whats going on sometimes.


If we want to ensure everyone enters the final week with a fair shot, I have a suggestion. I’ll make a poll if it seems reasonable to the top players in this thread.

There are a ton of factors that impact algo ratings in unintuitive ways: The meta shifting around through its life, an algo that it performs well against being uploaded many times, and it getting stranded with very high ratings.

The only way to remove these anomalies completely and guarentee a perfectly fair outcome for all players is to reset rating (and matchmaking history to allow rematches) for all algos at the end of the season, when we prevent further algo submissions during the ‘settling’ period. This ensures everyone starts on even footing, regardless of how long their algo has been uploaded and how many people uploaded algos that they perform well against.

I have always been in favor of this solution, but it was vetoed by the team due to concerns that players would feel that they ‘lost’ the rating they had earned throughout the season, or that they would feel cheated if their new rating was significantly lower than their pre-reset rating.

3 Likes

I would definitely be for this elo reset. It seems more fair that everyone has to attain their rating against the same pool of competitors.

1 Like

Also, maybe I’m misunderstanding how the cap works, but doesn’t it mean that both parties would have their score change by more than expected? So that high-rated algos will be able to still gain points from opponents who they should beat pretty much every time?

I’m not sure I like the elo reset idea, since as a result the complete season would in fact basically have no use at all (only the end would matter). Further, I believe @Felix did prove do be able to catch up on elo rating very quickly (before he removed his algo).
Of course I might be biased, since I benefit from not resetting, so even if the majority of players are in favor of resetting elo, I wouldn’t really have a problem with it.

I am definitely for the reset. It solves all fairness issues with repeated uploading of similar algos in a very short time. Without the reset, it’s theoretically possible to target a top algo and reduce it’s rating drastically, so that it’s no longer in the top 10 and if that algo had already played with most algos, then there is no change to get up again.
One downside could be, that the time when nobody can upload algos needs to be longer since more matches have to be played to make it fair.

@IWannaWin Yes, a couple days ago I uploaded my algo and it reached first place within 4 hours. Getting up is not the issue.

1 Like

Good discussion! I can see valid points on both sides.

I guess a lot of it comes down to how the global competition fits in these days. At one point, C1 wanted to incentivize participation throughout the season. In some ways the auto-competitions do this, but there is still no actual benefit to doing well in them right now.

If the season finals are going to be limited to the top 10 from the global leaderboard and there’s a full reset at the end, I agree with @IWannaWin that there is no practical point left to participating in the complete season. This rewards behavior like @Felix’s (I’m not trying to be accusational in this observation, by the way!) - only letting their algo compete on the global stage for very short periods of time before they remove it again.

But I don’t have any real answers to offer …

im going to bring this conversation to its own thread and include a poll