[URGENT] flaw in match making system!

kkroep · December 29, 2018, 12:23pm

The current top 2 algos are stuck there because they don’t play any matches.
as seen here.

This is a flaw in the matchmaking system that prioritizes elos that are close together. However, an exception should be made for algorithms that cannot find any matches because they have an elo of too high. My current debug_1.9 algo for example (at time of writing no.1) would never be able to get close to that elo in the current setting if I re-uploaded it. The effect is further amplified because the same match-up never happens twice.

The problem is two fold. The first one is that the top algo might not be the best player, simply because the algo is stuck without matches in an elo that is too high. The second problem is that this algo might not be the best performing algorithm of that player. Causing a worse performing algorithm to enter the grand finals.

Could this please be addressed before the grand finals?
Thanks in advance

More discussion is found here:

kkroep · December 29, 2018, 1:44pm

The tool of @bcverdict actually shows a more realistic representation of ELO:

https://bcverdict.github.io/?id=47218
https://bcverdict.github.io/?id=46336

Because the elo is calculated with current elos, one can see what elo one would get if the same sequence of matches was repeated with current elo. This analysis holds true until the last few data points where it tries to correct to match the current elo. Both algos would lie 80 points lower. One can see especially with GA how skewed this looks. It loses the last 3 matches, yet appears to jump 80 points. This is because when GA played against those algos previously, their elo was much higher (on average 80 points).

This problem is again amplified because there are no rematches. This means that if one already played a match against a strong algo at low elo, it doesn’t need to face it again once it has reached high elo.

This is not even considering the fact that the top two elos hardly play any more matches. One shouldn’t pay attention to the last few matches from my algorithms against GA, as this is a targeted effort by myself to get GA down to a more reasonable elo. I’m planning on deleting demux_1.9 in a bit to take care of my end too. However, this shouldn’t be necessary imo. It feels really stupid…

Ryan_Draves · December 29, 2018, 2:10pm

I talked about this several days ago with 8. He can share his graphs if he wants to, but basically there’s a “bridge” of elo as is inflated and deflated. I thought, being at the top of the bridge, would be invincible since it was from a time where it climbed to 2370 elo. For the most part it was, and it was still above everyone else when I took it down, but it lost its spot to other algos uploaded later in the bridge (sequentially Debug then GA).

GA is the last “giant” sitting on “old elo,” and your targeted attacks seemed to have finished it off. Mind you, every “effort” to bring it down only deflates other elos and inflates any of mine that might be winning.

You are right about the matches “drying up.” It takes about 12-16 hours for my uploads to run out of matches. However, please let C1 handle this, as I don’t see how could could be improving the situation.

kkroep · December 29, 2018, 3:05pm

Let me rephrase a bit. I am actually iterating on my algorithms. It is just that I wait to replace them until they climbed high enough to be matched against GA. I am not sitting here with as only purpose messing up the ladder. Although I completely realize I am creating a weird imbalance in the ladder by doing this.

I have the impression you are also iterating to find answers to my newer algos. Good luck in the last few days/hours of the grand challenge

kkroep · December 29, 2018, 3:20pm

That might also be because the computation time is extremely low? Some of the high elo algos are using alot of computation wrt most other algos. Not sure whether this affects matchmaking though.

What can also be a factor is that there is such a large rate of new algos that the server is occupied mostly with playing matches of these new algos.

Otherwise this behavior is especially unfortunate for all the players that are madly iterating to get the best shot before new year.

kkroep · December 29, 2018, 3:51pm

Okay yeah that seems to be a problem. I would think that if you lose more matches, you already have lots of information to work with to improve your algorithms, while this is much harder for algos that are currently winning. I would hope this behavior would be the exact opposite of what you are reporting. I wonder what C1’s thought process was behind this.

RegularRyan · December 31, 2018, 5:30pm

My current understanding, feel free to correct if something seems wrong:
I believe the deflation in Elo was caused by the fact that Elo is based off of win rates, that is, you could expect an algo with X more points than another algo to win Y% of the time. Due to a disproportionately large amount of the new algos entering the system being from top users iterating rapidly, they would very consistently beat Algos between 1500 and the top, reducing their scores, then deleting their algos. This process deflated scores across the system. However, because #1 and #2 were the main two players uploading rapidly, I believe they were less affected by the deflation, as their own algos did not play against themselves.

The top algos of these top 2 players are now ‘starved’ for matches, having played everyone around them, and it is nearly impossible for new Algos to reach the scores needed to face them. To correct this, i’m going to increase the maximum range of a match back to a higher number. The algos will have much higher losses than gains in their matches, so their Elo scores should correct, as long as they actually start playing matches.

Note: It seems one player just deleted their algo, settling back to a more reasonable #4 position, so there is only one with a disproportionately high score.

Another concern related to this is that an older, weaker algo will be selected because it has an incorrectly inflated Elo. To counteract this, top players can email or message me with their best algo if they do make it into the final top 10, we definetly want to see everyones best showing in the final competition.

What is the specific behavior you are refering to that is a concern?

Let me know if this sounds correct. Ultimately, this seems solved to me with the solutions I proposed, ill see if I can hotfix the matchmaking-range change in later today.

Ryan_Draves · December 31, 2018, 5:36pm

RIP GA. I wanted the 6th slot anyways but it looks like I took it down 5 minutes too early ;(

kkroep · December 31, 2018, 5:37pm

I will delete my top algos too now. I don’t want debug_1.9 to be my entry, as it is substantially worse than my newer iterations. I am contemplating deleting my demux_1.21 algo too as it might also be out of reach, even though I am quite happy with that entry.

The other issue we where discussing is based of the analysis of @876584635678890 where worse performing algorithms play substantially more matches. An example being an algorithm that literally does nothing.

The other problem is that two algorithms are only matched against each other once. This is an issue if two players completely dominate the field, like what happened a week ago. In the future you might consider looking into that.

Just to check that I have the time difference correct: I have about 10 hours to make that decision right?
It is currently 18:36 in the Netherlands.

RegularRyan · December 31, 2018, 6:34pm

This is definitely the main deviation between terminal and other matchmaking systems we model after. In terminal, because Algo’s have their own rating, one ‘player’ is one algo. This means that players do not change in skill, and two arbitrary algos often have very consistant performance when matched against eachother. It’s a consideration I always keep in mind when working on matchmaking related things, and is the root cause of the match-starvation we see at very high ranks and the issues associated to that. This one is a more fundamental problem in the game, and a satisfying solution would have a pretty large scope. We’ve tossed ideas around like adding variation some aspect of the game state at the start of the game or more overarching changes to the player-elo-algo relationship. Ultimately, we think the problems caused by this issue aren’t large enough to be worth considering implementing these large scale changes for now.

I’ll investigate this. There are a few things that could cause this: Currently, I assume it is caused by a majority of matches being played around 1500 due to burst matches. If this appears to be the case, I will not consider it an issue.

It is currently 1:30PM ET, algo submissions will close at 11:59PM ET, in about 10 hours. I reccomend not cutting it close, you don’t want to miss out from a connection issue or something.