Why the Leaderboard will not change Any Time Soon

Below, I prepared a list of matches of the top five algos on the current leader board and added information about their enemy’s elo score.

There are quite a few observations we can make here:

  1. All of the top algos are matched up against significantly worse opponents (I will come to the reasons later).

  2. The higher their elo the lower their enemies’ elo (almost).

  3. While we already observed that there are no matches against equally strong algos, we can also see that they are sometimes facing horrible algos, which probably just crash and therefore have a negative score.

Explanation

The fact that the strongest opponent had 1702 elo, which is still not matching them (algos with more than 1702 might be near a hundred because many of the top users have multiple algos in that region), is explained by two major attributes of the matchmaking:

First of all, there are no rematches possible, which explains why Truth_of_Cthaeh is facing the worst enemies: It is the oldest algo and has probably already played most of the high elo competitors. Secondly, I have graphed the distribution of best algos for each user on the leader board for you:

https://plot.ly/create/?fid=aFGHtqwfeghja7d5269378:1 (all data is from about 17:30 UTC 18 October 2018)

The reason why it looks like there are algos above 2000 and below -500 is due to the nature of the plot I have chosen.
The graph allows us to make out two important features of the elo distribution:

  1. The median lies at 1147.

  2. There are two big bumps: One a bit below the starter elo (1500) and one at 186.75.

The lower “bump” probably consists of crashing algos and thus might move down even further with time if they keep receiving this amount of matches.

Coming back to Cthaeh's matches: Due to the distribution of algos in Terminal, the likelihood of the top algos matching against lower tier algos is very high because every algo appears to be having the same amount of matches.

Because sawtooth and rank three to five are “younger” than our top algo, they can still match against higher tier algos that KauffK’s algo might have already beaten.

Combining the fact that the majority of algos are ranked way below 1702 elo and that the top algos have already faced some of the other top algos, it is just very unlikely that they are going to play against equally strong opponents if every algo has an equal probability of playing (which is why I proposed this idea).
Acknowledging this, we can easily understand that the top algos will not lose against (often crashing) way worse algos and thus stay where they are at.


Top Algos

Truth_of_Cthaeh by KauffK = 1965

~683.61

Matches

https://terminal.c1games.com/watch/427260 1491
https://terminal.c1games.com/watch/429846 150
https://terminal.c1games.com/watch/432735 1596
https://terminal.c1games.com/watch/428554 1184
https://terminal.c1games.com/watch/440895 1193
https://terminal.c1games.com/watch/438062 1487
https://terminal.c1games.com/watch/428487 -49
https://terminal.c1games.com/watch/431788 1491
https://terminal.c1games.com/watch/428792 192
https://terminal.c1games.com/watch/433984 900
https://terminal.c1games.com/watch/435429 -83
https://terminal.c1games.com/watch/440129 1050
https://terminal.c1games.com/watch/439428 -5
https://terminal.c1games.com/watch/436755 993
https://terminal.c1games.com/watch/435250 -72
https://terminal.c1games.com/watch/431061 1055
https://terminal.c1games.com/watch/433165 -175
https://terminal.c1games.com/watch/427251 -93

sawtooth by AdrianMargel = 1920

~918.00

Matches

https://terminal.c1games.com/watch/428045 OLDISNEW1.0=1563
https://terminal.c1games.com/watch/433038 1071
https://terminal.c1games.com/watch/429282 1691
https://terminal.c1games.com/watch/426783 1159
https://terminal.c1games.com/watch/437017 1508
https://terminal.c1games.com/watch/439690 1422
https://terminal.c1games.com/watch/438323 1565
https://terminal.c1games.com/watch/434396 103
https://terminal.c1games.com/watch/431766 1584
https://terminal.c1games.com/watch/430526 396
https://terminal.c1games.com/watch/429859 134
https://terminal.c1games.com/watch/429833 991
https://terminal.c1games.com/watch/440532 145
https://terminal.c1games.com/watch/433315 151
https://terminal.c1games.com/watch/435677 1326
https://terminal.c1games.com/watch/439766 -102
https://terminal.c1games.com/watch/437042 899

gamma_13 by kkroep = 1861

~1002.41

Matches

https://terminal.c1games.com/watch/436529 1230
https://terminal.c1games.com/watch/434548 1547
https://terminal.c1games.com/watch/439315 1586
https://terminal.c1games.com/watch/437023 1576
https://terminal.c1games.com/watch/432296 1696
https://terminal.c1games.com/watch/440529 1234
https://terminal.c1games.com/watch/437512 chinese-algo2=1635
https://terminal.c1games.com/watch/434820 1287
https://terminal.c1games.com/watch/436695 934
https://terminal.c1games.com/watch/436525 1228
https://terminal.c1games.com/watch/434578 1128
https://terminal.c1games.com/watch/439905 185
https://terminal.c1games.com/watch/438839 -95
https://terminal.c1games.com/watch/435198 183
https://terminal.c1games.com/watch/433271 110
https://terminal.c1games.com/watch/431991 1169
https://terminal.c1games.com/watch/431656 408

Cubed-9 by RuberCuber = 1831

~1064.37

Matches

https://terminal.c1games.com/watch/439470 NEUNEU4=1569
https://terminal.c1games.com/watch/436718 1597
https://terminal.c1games.com/watch/433077 1595
https://terminal.c1games.com/watch/440843 1458
https://terminal.c1games.com/watch/430177 1189
https://terminal.c1games.com/watch/430246 1702
https://terminal.c1games.com/watch/432545 1612
https://terminal.c1games.com/watch/431459 1564
https://terminal.c1games.com/watch/433406 1591
https://terminal.c1games.com/watch/440837 189
https://terminal.c1games.com/watch/436193 209
https://terminal.c1games.com/watch/435063 1438
https://terminal.c1games.com/watch/436794 294
https://terminal.c1games.com/watch/435496 160
https://terminal.c1games.com/watch/432814 1252
https://terminal.c1games.com/watch/433806 150
https://terminal.c1games.com/watch/440840 1058
https://terminal.c1games.com/watch/438096 59
https://terminal.c1games.com/watch/428922 1537

Felix_0.9.1 by FelixRichter = 1817

~1044.37

Matches

https://terminal.c1games.com/watch/429993 1123
https://terminal.c1games.com/watch/425233 1603
https://terminal.c1games.com/watch/438515 1283
https://terminal.c1games.com/watch/428091 1543
https://terminal.c1games.com/watch/434705 1589
https://terminal.c1games.com/watch/425559 1353
https://terminal.c1games.com/watch/439394 172
https://terminal.c1games.com/watch/433444 771
https://terminal.c1games.com/watch/440811 650
https://terminal.c1games.com/watch/439753 202
https://terminal.c1games.com/watch/437646 1074
https://terminal.c1games.com/watch/436322 230
https://terminal.c1games.com/watch/435886 1434
https://terminal.c1games.com/watch/432176 1457
https://terminal.c1games.com/watch/431206 1657
https://terminal.c1games.com/watch/428720 1501
https://terminal.c1games.com/watch/440535 282
https://terminal.c1games.com/watch/440409 1158
https://terminal.c1games.com/watch/426896 761


Extras

The number after each of the top five’s names is their elo and the one below each title (~{number}) represents the average elo of their enemies and the score of each algo they were facing is displayed right next to the watch link. All losses from the respective algos are marked bold + italic and feature the name of the algo that managed to beat them.

The title is referring to the current matchmaking situation and “Any Time Soon” is short term compared to the existence of Terminal => days/weeks.

5 Likes

I agree sth needs to be changed. For example i don’t understand why top algo is mached with low elo algos when there are higher placed algos that didn’t play vs the top one yet.

For example i uploaded my algo 2 days ago, it’s doing well and is curently 1700 elo +, yet it didn’t get a chance to play vs the top algos while they are playing vs algos below 1500 elo.

3 Likes

Great analysis. Wanted to ask about graph and above 2000, but you already covered it.
I support this idea.
This will not affect the tournament results, however, would be nice to see matchmaking according to strength of algorithm rather than giving false impression for ones on the top. As @876584635678890 has shown top algos unfortunately are faced against weaker algos and could be surprised to see the tournament results.

1 Like

@ziomkus For other anecdotal evidence, I have an algo on the rise that was added a couple days ago (currently sitting at 1751 elo).

In the last 18 matches I can see currently (the list shown in MyAlgos), I’ve played two algos of note - one at 1842 and one at 1833. But I also see 5 matches against algos under 300.

I saw one reference to algos needing to be marked as “available”, so maybe the mass of low tier algos is locking up competent opponents?

Maybe this all gets solved once the “fix” to only match algos within 400 of each other goes in?

Edit: given @Janis’s point, with the CodeBullet competition in one week, I’m eager for real feedback on my algo. I really hope the change to match making can be made with enough time to be meaningful for all of us.

1 Like

@Janis It will indeed only affect results of the global competition, but that it could potentially be inaccurate due to unlucky matchmaking, which makes it harder to improve on algos.
I changed the link to the graph to now show a histogram by default, which will just give you a different view on the data, which is less misleading, but also does not show the distribution as uniformly.

1 Like

@n-sanders My personal experience has also shown that feedback is slow.
This is especially unfortunate because manual algos (contrary to e.g. machine learning) do not take a lot of time to adjust and therefore decisions could be based on false perception of how your algos performed.

1 Like

@876584635678890 Did you pull elo numbers from crawling api/game/leaderboard?page=X ?

I trust the accuracy of the trend, but it’s worth being a little cautious with some specifics. The leaderboard (as of a couple weeks ago) only includes the highest elo for each user. I’m guessing KauffK has 6 Truth_of_Cthaeh uploaded (and apparently all named the same!) and all probably with 1800+ elo. That may even affect the median calculation (assuming the users who uploaded crashbots haven’t uploaded any new algos given their best performing is still a crash bot).

I’m not disagreeing at all with your findings - match making needs to be fixed, just wanted to give that word of caution as people look over this data.

1 Like

Oops, guess I got excited and should have read a little slower :slight_smile:

Thanks for your thoroughness!!

The algos should be closer to a normal distribution on the leaderboard, but the large lump at the bottom is due partly to an issue that existed last week and fixed on monday where losers were losing more points than intended.

We have 3 ‘low hanging fruit’ updates in the works for matchmaking which we are hoping to deploy by the end of next week, and should help a lot with the existing issues in matchmaking. You can check those out on the other post discussing matchmaking.

2 Likes

Probably a long shot, but would lowering the scope of the (initial) matchmaking fix give you a chance to deploy it before the CodeBullet challenge? Even if just #1 was done (avoid matchmaking when 0 Elo can be gained) it would do wonders for getting real, meaningful feedback in time for that major event.

Out of curiosity, and not really related to this thread but it sparked my question when I read how the leaderboard should ideally be useful feedback for competitions:
How are competition winners determined? Presumably not ELO because that doesn’t seem quite right. But I would expect that the competition match process accounts for the non-transitive victory relationship between many algos. Round-Robin scoring or something like that?

1 Like

Looks like the competitions so far have all been single elimination brackets.

Really? I will be surprised and concerned if that is true. Leaderboard has shown many cases of algos that beat most opponents, but lose to a few that for whatever reason have a design that counters them specifically but loses to most others. (Take for example this now-famous occurance: Match of the Week)

If I just happened to be paired with one such algo in the first bracket of a single-elimination tournament, I would be pretty upset and I imagine others would be as well.

2 Likes

You can look at the prior competitions and see the games played throughout the various rounds. All of them appear to be single-elim format.

Link for the most recent competition:
https://terminal.c1games.com/competitions/28 (then select “matches”)

1 Like

@KauffK I covered the order of events of that competition in more detail here.
From that you can make out that the highest elo algo was able to win the tournament, but I feel like these concerns would be valid and that raises a bunch of problems in my mind, I will not write out loud here.

However, you need to remember that that is kinda the point of a tournament and I am totally fine with.
If your algo is not perfect, it might be “unlucky” and lose to another algo that counters it, but that is the strength of that algo and it was unarguably stronger in at least that aspect.

1 Like

You’re right single elimination has a lot of issues and are a little luck based (though if your algo is perfect you don’t need luck :stuck_out_tongue: as 8 said).

Some solutions we are already implementing. One is seeding based on ELO so that if you are high ELO you’re more likely to be matched with low ELO algos in the beginning, and it awards posting your algo earlier instead of keeping it a secret. So if you worked hard on your algo you should have a good chance of making it past the initial rounds at least.

Another is we plan on having frequent tournaments. So that even if you get some bad luck in one tournament you can enter another in a week or two.

In general the tournaments are more for “hype”, fun, and introducing people to the game, and we wanted to keep them simple. While the ELO system is meant to be more for figuring out who is truly the best.

Lastly, the big prize money for the global competition is based on ELO so you don’t have to worry about unlucky matches for that.

But we are still discussing all of this internally, though big changes to the tournaments are possible we may have other cooler features to prioritize.

3 Likes

@C1Junaid Just wanted to mention that I can cosign all of what you said and that seeding is a really nice idea, which embraces the idea of tournaments even more :+1::heavy_plus_sign:

1 Like

Yeah this seems reasonable, especially in a world of frequent tournaments that cost nothing to enter.

Now excuse me while I go chasing that perfect algo :wink:

1 Like

@KauffK @8
To add to these points, we are still very early on in terminals lifecycle. We have almost exclusively discussed plans that we plan to implement in the very most 3 weeks, but once all of our core features are fully fleshed out we will definitely be experimenting with different formats for tournaments.

@n-sanders
I can confirm that there is an extremely low chance that matchmaking changes will be made before the codebullet challenge. We have a live event at UMich next weekend, and for business reasons features related to that event have been prioritized. This includes a learning center for better on boarding, and things relating exclusively to our live-events.

The ‘cop out’ I can give you is that everyone is equally disadvantaged by the current issues with matchmaking, though its obviously not ideal for anyone.

1 Like

@RegularRyan
Thanks for your transparency. The root cause here is how much I’m enjoying this whole thing. :slight_smile:
I had originally planned to just participate in the CodeBullet competition and wind down my involvement after that, but now that I’m so close to cracking the top 10 global leader board, there’s no way I’d be content to stop in a week!

I’m excited to see what’s more to come as things continue to evolve here