Why is Elo Inflating so much?

So much for NMFF’s elo stabilizing … It grew to the point of overtaking MazeRunner after MR had a few losses in a row today (NMFF has an elo of 2222 at the time of posting).

I’m not sure what this tells us anymore …

I’m guessing you win virtually every match against mazes, and have pretty consistent response to other algos, so that gives you top spot. The question is, would the NMFF have won against the algos the mazerunner lost to? If not it would have been simply unlucky/lucky matchmaking, and shouldn’t be stable?

NMFF (most likely) would not have won against the algos that beat MR. There’s one match that’s in question, but it wouldn’t account for much margin of error here. So at some point NMFF will face (and lose to) those same algos.

Perhaps it’s incorrect to think there will be elo stability because of the continuous influx of new algos. There’s around 400-500 uploaded each day (based on 8’s charts), so depending on matchups against those new algos, things will always be changing. NMFF and MR can beat most people’s first maze implementations, but they lose to a certain approach in maze behavior (which account for all three of the recent MR-vs-maze losses) and I’m seeing that approach more often in the global competition. So it definitely seems people are getting better at optimizing the maze-maze matchup.

This is similar to what I pointed out about sawtooth vs sawoothV2.

Basically, both algos experienced a similar trend in gaining elo, which for now indicates elo inflation.

1 Like

I bet top elo will be >2706 at the end of the month (without matchmaking changes).

:face_with_hand_over_mouth:

1 Like

Was just trolling old posts and found this guy. Unfortunately I didn’t see this while you guys were discussing it before, but I just wanted to add my 2 cents to the conversation because I probably have the most theoretical and empirical exposure to our elo system. Want to dispel any unrest you guys are experiencing.

One thing I want to throw out there immediately is that neither the mean nor median elo is inflating; in fact it’s deflating.

I’ve been monitoring it for the past few months, and the system is behaving as mathematically intended. Since the last elo bug was patched some time in october, the total elo among deleted and active algos has stayed constant, which is expected from a zero sum system. An observed elo pool can appear to shift when people delete algos. An elo pool can shift up when an influx of bad algos are uploaded then deleted: They give elo to other algos then go away. The opposite can also happen (as is the case in our system). I hope this makes sense to you.

When you consider only the 6 active algos for every user, mean elo has trended down from 1488 to 1129 since october. Do not be alarmed. As @kkroep said, this is ok for elo systems and attempts to correct it by shifting the default elo would be fruitless. Some chess leagues like the USCF also have mean elo drops into the 1100s.

So as was mentioned above, the reason the diagrams show an upward trend is that on average people’s top algos are getting better. With access to all the data I can assure you that overall elo is not inflating.

Mathematically speaking, elo can go arbitrarily high, but it will be balanced (zero sum) on the other end. As Ryan said, elo is an attempt to fit winrates to a normal distribution. Let’s say right now an influx of average people join the game. This doesn’t affect the skill of the top 10, but what it effectively did is lower the standard deviation on the distribution. If the standard deviation of the distribution is lower, the top 10 must be more standard deviations above the mean to represent the same absolute skill (this is not precisely right, but it’s close enough for this post). The easy answer is that as long as the top players keep getting better w.r.t the average player, the elo of the top players will continue to go up. This is happening really fast right now because Terminal is pretty young, while a game like chess is thousands of years old.

Hope this helps.

7 Likes