Player ratings, near Season End

Demorf · May 14, 2020, 9:46am

I know this was discussed before, and probably does not have easy solution, but I wanted to mention,
few specific problems, and hopefully we may find a work around for them:

Bigger gaps, when player rapidly submit new algos
At some point we had like 150 points of gap, between top 3 players (2700 | 2550 | 2400)
This in turn has several side effects:

it gets hard to get a match vs the top players even if you have better algo
and if you are between 2 gaps … you don’t get ANY matches
I personally had a situation where my newly submitted algo was staying on 15:0 matches for several hours.

Then when things start to normalize, and you get more mixed matches, you may save an algo with ridiculously high rating that rarely gets a new match, as it is 100-200 points above the rest,
but if you resubmit it … it will it ends up in middle tier.

Again, this is not a big problem, in normal days, as things get normalized after few days,
but near the Season end … it can be abused, and tricking the rating system is not goal of this game IMO but is currently a thing.

n-sanders · May 14, 2020, 1:03pm

Another factor is algos being added and deleted during rapid prototyping. I had 13 algos submitted on 5/10 alone (and only 4 stuck around past that day). They would rise quickly to the 2.1k - 2.2k range and then get pulled after I saw their behavior in a few key match-ups.

That means some algos had the chance to get up to nine wins against ~2.1k rated algos that others would never play. This may help explain how a resubmitted algo could get stuck in the middle tier when there are “less” ~2.1k wins available for it.

Demorf · May 14, 2020, 2:43pm

Yep, good example. and also the reverse is true: if a algo reached 2400 3 days ago …
it will never have to face the blood bath on 2100, with the new challengers there.

I tend to keep my highest rated algo for benchmark even if it is few days old.
But it is highly misleading for me and especially for the Leaderboard.
Often if i resubmit it, it gets smashed.

I think something like a decay mechanic can solve some of this issue:
If the algo did not find a match for the past hour, remove 10 points or 1% from its score.

n-sanders · May 14, 2020, 2:56pm

That’s a very clever idea. I’m not sure how I feel about it though. I know C1 has said in the past they want to incentivize algos being around for a while, rather than creating a system that encourages procrastination until near the end of the season (part of why they have the auto-competitions now). And being penalized because their matchmaking system couldn’t find a match for my algo to play sounds a bit discouraging.

I felt like this season had pretty good fluidity regarding the top algo spots. And I think @acshikh’s au2 spent the whole season out in the wild and maintained a solid rating (but they’d have to verify if it was ever taken down and re-uploaded).

There are definitely times when algos can rise quite high if their strat works exceptionally well versus the current meta at the time. @amit21min’s ‘Good Night Reliable’ is an example of this. They held the top spot for a while, and ended up at #12 at season’s end.

max1e6 · May 14, 2020, 3:04pm

Another option would be to broaden the range of Elo that matchmaking considers. If an algo is at the top it can persist up there quite a while because it is not getting matches with the sub 2k algos that happen to counter it.

However, I’m not sure that any of these tweaks will necessarily do all that much better than the current system, because there is the underlying problem that terminal has a sort of rock-paper-scissors quality to it. This means that the notion of an algo’s strength can’t really quite be described by just one number as in its Elo rating.

RegularRyan · May 14, 2020, 10:51pm

Overall im pretty happy with how rating has been working since we switched over to glicko in season two, but agree that algos being stranded at the top of the leaderboard could cause issues.

What are the thoughts on some of these ideas:

Limit algo uploads to 1 or two per day, stacking up to like 8 if you have not used your uploads. This will reduce shocks to the system when top players upload alot all at once. It also forces a more gradual iteration process throughout a season, rather than trying to rapidly iterate on the last day before anyone has a chance to see your work. The obvious downside of this idea is how cumbersome the constraint might feel.
Increase the maximum matchmaking rating difference between opponents. The concern here is that a high ranking player loses alot of rating if they lose a match against someone very far away from them in rating, and gain very little if they win. So it can feel bad if players are matched like this often. Potentially, I could set a cap on rating losses such that you cant lose more rating than you would playing against someone 400 points away from you…
Allow algos to rematch if they havent played in over a certain time period, maby like 2 days. This will at least keep rating more liquid at high ratings, but im not a fan of this idea since repeated matches aren’t very useful for iteration, and will almost always have the same result.
Perform a blanket ‘deflation’ pulling everyone back towards 1500 once per week. Say, reducing everyone’s rating by 0.2 * (your_rating - 1500). This will get stranded algos more matches and reduce the relative badness of playing against much lower rated alogs. It also pulls outliers back towards where they maby actually are supposed to be. But im not 100% sure how nicely this math plays with the base glicko formula. Its also pretty confusing for anyone just looking at the leaderboard everyday.

acshikh · May 15, 2020, 12:05am

Personally, I have loved for some period having an algo at the top reaches of rating! It is fun and rewarding to achieve that, and it would feel less rewarding to have that rating drained away by some sort of deflation.

Also, while it is certainly sad to lose a lot of rating to a low ranked player, it also feels just to me! Hey, if a 1950 ranked player beats a 2600 ranked one, I think that is real cool for the lower ranked player and well, clearly it shows that my fancy code has a hole in its armor!

So my vote here is for increasing the maximum matchmaking rating difference. Maybe we just have that maximum difference increase arbitrarily whenever an algo can no longer find anyone to play within the current difference, guaranteeing, say, at least one more match every hour or so, regardless of how far away that other algo is in ranking.

RegularRyan · May 15, 2020, 12:52am

The root of the issue is that if you run out of players near the top, you will constantly play these games where you have nothing to gain.

I think I am going to have the maximum elo range increase the longer an algo goes with no matches, so that game playing will still slow significantly but will not stop completely if you end up on a ‘rating island’. These players won’t be spammed with games where they have nothing to gain, but will still have to fend off challangers occasionally to hold onto their position.

Demorf · May 15, 2020, 5:20am

That sound about right.
May be also reduce the win/lose rating proportionally when making this special matches.
The goal is to get SOME movement, but not to crash them in the ground.

RegularRyan · May 15, 2020, 2:55pm

Also putting this in, these changes will be out sometime next week

owenvt · May 15, 2020, 5:15pm

I would be for allowing rematches if an algo has exhausted all its possible opponents. The way it works now when uploading a new algo, you want to play the high ranking algos asap to climb faster (which also takes them lower), or if you lose it won’t hurt as much as if you play them later when ranked higher. Also some algos are random and have different results, although I agree it is less useful than playing new algos which is why I would suggest it only after exhausting new opponents.

n-sanders · May 15, 2020, 6:43pm

This was a big problem before @RegularRyan introduced the glicko rating system. He would need to confirm, but I believe the glicko system makes the timing of the match matter much less due to a built in “uncertainty” about a new algo’s true rating.

Even as someone that ran an algo with some amount of random behavior, I’m not seeing the clear benefit to rematches.

Demorf · June 6, 2020, 7:26pm

OK, lets reflect on this change, have any one noticed positive change ?

I seem to be a bit lucky that my Strategy works well on Season 6 and we can observe a drastic change on the Leaderboard.

I have 6 algos uploaded that are technically identical.
The first 2, Demorf_v6-0-1, Demorf_v6-0-2 have unproportional +200 rating, above my other algos, and around +300 more around the other players at this moment.

I can see there a enough low level matches … but they are often easy wins … and not sure if they give any positive rating at all, but surely, they don’t bring any balance:

So … to Reiterate the problem again:
By luck and timing, I have an algo that is 200+ rating above the others.
And current match making mechanics push it even further apart.
I can not achieve this rating with a new upload of the same algo.
But If I just keep it … i got some unfair positioning advantage.

Felix · June 8, 2020, 2:12pm

I think it’s great, that there are low level matches when no higher matches are possible anymore. And I don’t see any issue with the fact that not all of your algos have the same rating. It certainly is related to the fact that your older algos have played more matches and therefore have a higher rating.
For comparing my own algos I am preferring your tool Cross Check - Community Projects - Terminal Forum anyway instead of the rating.
But if you wnat your elo to drop again, I could upload an Anti-Demorf-Algo

Demorf · June 8, 2020, 2:27pm

Looking at this after few days, this extra matches are may be making the situation worse:
The give some minimal rating may be less then five points… but it is above zero, and it adds up over time.
So in fact the “fix” is may be making the situation worse.

@Felix, lets make a real test for this:
If you manage to drop my rating to 2500 with less then 20 uploads,
I will happily remove my algos and re-upload them, clearing the board a bit.

If you don’t manage to do it … this will confirm the problem.

Felix · June 8, 2020, 2:34pm

Challenge accepted!
But I might not have time for this before thursday this week.

owenvt · June 8, 2020, 4:59pm

I don’t see any issue if there is a large gap in ratings so long as a high rated algo will continue to find opponents. To my understanding, the only reason Demorf’s isn’t dropping is because it isn’t losing. If it starts losing it should drop, whereas in the old system it would stop getting matches and be unable to drop.

Demorf · June 8, 2020, 6:15pm

It is not so black and white. Demorf_v6-1-2 is losing … just 5% time.
But most of the “extra matches” are vs really random players … some of them really low, so even when losing from them the penalty is probably reduced.

The end result is my algo keeps climbing in rating and this is even worse then before, where you just kept your high rating.

I would love to see some ones rating on the leaderboard to be reproducible, by uploading the same algo.

I know there is probably not a simple thing to balance, but we have good setup now for testing it.

owenvt · June 8, 2020, 7:39pm

That 234 rating jump from beating a 2057 rated algo at the end seems odd.

owenvt · June 8, 2020, 7:48pm

The change that was made doesn’t address the problem of rapidly iterated algos pushing the algos that beat them up and the ones that lose down. This will still occur, and perhaps more so as the algos on either end are no longer “out of reach” of this phenomena. The changed addressed lack of availability of matches, not large gaps in the ratings. Yes, you will continue to rise until a challenger appears. I think if Felix does upload an algo to beat you it will fall quickly. That last match where you jumped 234 rating does seem like a bug though.