What frustration looks like

KauffK · November 26, 2018, 6:24pm

To my companions in the eternal struggle to edge each other out of the very narrowly separated ranks near the top of the leaderboard, I share with you a frustration that i’m sure we have all experienced to some degree:

RIP my 2400+ algo, taking the most spectacular drop I’ve ever seen in a matter of hours. I’ve been reflecting whether this changes any of my opinions about the way the leaderboard and/or the global competition functions, but I can’t come up with anything and I believe this is just a natural result of each player having up to 6 algos.

All I can say is that I pray to all the gods of programming, known and unknown, that this does not happen on the evening of December 31.

Edit: To any dev’s (or I guess just people with opinions) that read this, just for the sake of discussion, do you have any vision for the scenario where this happens on the final day? Is it something worth addressing, or just an acceptable happenstance in a system that is otherwise working okay and is difficult to balance? Does it have any implications for the integrity of the prize structure, or does it just crank up the attention and excitement in the final days?

Ryan_Draves · November 26, 2018, 6:39pm

Mwahahaha my aglo did this to RuberCuber too since his 3.4 series. He’s on 3.11.

Except that series of algos you lost to loses to 3.9+, but now it took you out.

My algos are like blue shells at this point. They keep taking out the #1 guy then lose to the next 7 players.

Edit: See response below. This is in good humor at the circumstance, not intentional mischief.

RedRyan2.0 · November 26, 2018, 8:48pm

I thought about this. If you were to upload 6 identical algos that were all pretty good, anyone you can consistently beat would drop a lot. Just never thought it would happen in practice. But anyone who could consistently beat you would get a boost. Doing this should widen the gap between players. This might be a cause of the elo inflation. Still, so bad and so funny that this actually happened XD.

876584635678890 · November 26, 2018, 9:04pm

If the leader board really gets reset, I would suggest to allow only a single algo per player for the last weeks.

arnby · November 26, 2018, 9:11pm

It will slow down the overall improvement during the last days but I agree with you

AlexM · November 26, 2018, 10:30pm

I’ve been on the other side of this issue, with my okayish algo suddenly rising in the rankings. It has been sitting around #8 on the leaderboards for about a week now, but apparently it was up at #1 earlier today! Looking at the games that it has played it looks like a lot of the sudden rise can be attributed to winning against all the MichiganDifference algos.

It does worry me that the leaderboards can be so volatile just because an algo can beat one specific algo, even if it loses against a lot of the rest of the top algos. In the last week everyone in the top 6 has held the number 1 place on the leaderboard at some point, so it is looking like the final result could very well come down to chance, which would be unfortunate.

kkroep · November 26, 2018, 11:00pm

I agree with the sentiment here. The leaderboard is volatile and there is no clear winner or best algorithm. This is as it should be, a healthy competition is generally a great thing. But we need to consider the consequences for how the global challenge will be affected by last day matchmaking chance. It also becomes appealing to make hard-counters that target competitors for the last days, which I would not like as a strategy.

RegularRyan · November 26, 2018, 11:05pm

If we determine that anyone is intentionally abusing this to disrupt leaderboard positions, we will make them ineligible for rewards at our discression. We are looking into low-cost ways to mitigate potential problems caused by many copies of the same algo uploaded around the same time.

@876584635678890
We can confirm that we will not be performing a leaderboard reset. We have some larger scale changes that we will be waiting until after the global competition to release for a handful of reasons.

Note that in my posts I usually click a random ‘reply’ button, this response is not specifically for AlexM

Ryan_Draves · November 27, 2018, 12:18am

Woah I’m just uploading 5 copies at a time to get more games/averaged data on how it’s performing. Not trying to snipe people out of the leaderboards or cause harm. If it’s a clear issue I’m more than willing to tone it down to 1 or 2. For the time being, I’ll take down all but 2 copies of my new version.

Regardless, I’m at the point where it’ll likely be toned down anyways. Up until this point it’s been a clear path forward, but now I think I have about 2 combinations of changes to either include or not include, so my uploads should start looking and behaving different. The last major change has just about inverted who I win/lose against on the top 10, much to @KauffK’s dismay and @RuberCuber’s joy.

Also, I think the issue lies in having the same algo altogether. If I had spaced out the time I uploaded each copy by about an hour or two each, the same thing would’ve happened as my algo runs out of people to play against at the top, but it just would’ve been spaced out in his match history with nobody the wiser. I can assure you anyone in the top 10 or 20 is unintentionally doing the same thing to the leaderboards. Once my algos climb to around ~2200, they play all the uploaded versions of the top algos until they run out of opponents, then all they match are new versions of the top algos and 1900-2000 elo matches. Those last couple matches before the pool dries up seem to always come down to the same algos, making the histories look quite similar.

Again, I’ll tone it down for the time being. I need to diversify my uploads anyways. Not trying to be malicious with the system.

RuberCuber · November 27, 2018, 1:51am

In regards to the final leaderboard prize, I conducted a small experiment recently. Some of you may have seen an algo called ‘Track-3.4-Experiment’ reach the top of the leaderboard recently. This algo was an exact copy of my ‘Track-3.4’ algo, and I initially wanted to find out whether uploading the same algo now would result in the same elo as an algo uploaded a long time ago. Turn out, the newer algo, within 2 days, surpassed the original algo by about 100 elo. Over the next few days, the elo slowly dropped until it was about the same as the original algo. It seems that the algo peaked early on, within 2 days or so, before declining and settling at a slightly lower elo.

Which raises the following question: what’s stopping people from uploading or reuploading their own algos 1 or 2 days before the deadline to try to snipe a higher leaderboard position? I feel this will probably happen if the current system is in place. Maybe a better system could be amount of time spent on the top of the leaderboard along with the corresponding leaderboard position. That way the algos are ranked for their consistent performance over the period of time, rather than just their final position.

Isaac · November 27, 2018, 4:13am

I have noticed this exact same behavior consistently with all the algos I upload, a kind of peak and then settling back down.

I think a possible solution would be at the start of the competition prevent people from uploading new bots and let it run, say 2 days or so. This would give the system time to let each algo settle to their correct spot regardless of when the algo was uploaded. When I say freeze, there are other solutions (like run on a different server) but the point is keep running games without new algos.

n-sanders · November 27, 2018, 5:45am

I have done the same thing in the past (in the days before the Code bullet challenge), but I’m pretty certain having more than a single copy of any one algo is frowned upon now (part of why there as been discussion around team rules). With the motivation being that matchmaking is better now so we should get the “right” games played without needing duplicate algos.

@RegularRyan - I’d love to get C1’s official position regarding this.

Obviously it can be hard to define “duplicate” when we’re talking about iterative development. If I tweak the end-game behavior of an algo that doesn’t execute until round 75, it may make “identical” placements as my untweaked version 99% of the time, but would still count as not being a duplicate. But I think it’s best to define what the spirit of the rules are with our algo slots.

RegularRyan · November 27, 2018, 7:36am

@Ryan_Draves
Sorry about the miscommunication, I meant that now that the problem is known we will be keeping an eye out for people trying to abuse it, not that we thought you were being malicious. Your good.

@n-sanders
This is something we have discussed internally and came to the same conclusion as you - it is really difficult to determine what qualifies as a ‘different’ algo as long as the source code isn’t exactly identical. If we banned exact identical algos, users hoping to take advantage of the system can still easily make arbitrarily small changes to get all the same benefits, while many other users would be inconvenienced. For this reason, we do allow duplicate algos on one account.

I like the idea of giving the leaderboard a number of days with no submissions to settle, and some other C1 members do as well. We are planning to finalize the details for the global competition before/on December 1st and ill include any plans regarding this.

KauffK · November 27, 2018, 3:34pm

After reflecting on the original problem, I think part of what makes it frustrating is that whenever one of the top players releases a wave of improved versions of their algo, the algos that are already at the top are matched with them as soon as them come within range. I’ve had a number of algos stifled around 2300 because the next batch of Oracles or Aeloos or whatever get to consecutively rip my algo apart when they’re only around 2100, even though they’re all destined to end up near 2300+. And if I manage to beat these top-notch algos, which does happen perhaps 50% of the time, the elo gained is only a small fraction of what is lost. But for the opponents, they receive a large boost from beating an algo that in reality was probably a pretty close match, despite the elo difference.

I believe this is probably part of what causes the “Spike and then settle back down” behavior observed by RuberCuber: The fact that top-notch algos on the rise are paired with top-notch algos already at the top. Naturally, this would happen because the algos at the top have already faced their peers and are “starved for matches”, thus getting matched as soon as new algos have climbed high enough.

RegularRyan · November 27, 2018, 4:22pm

This is something I was concerned could become problematic, and which I believe will no longer be an issue after the global competition due to larget changes. For now, I think reducing the maximum range between from 400 to 100 and possibly adjusting the elo exchanged per game is a quick config change that should mitigate this issue, i’ll propose this change to the team.

Aeldrexan · November 27, 2018, 5:07pm

I don’t like your idea of reducing the maximum elo range, for 2 reason:

The first reason is that we have seen several time a top algo been more than 100 elo points above the others.

The second reason is that we could end up with a of “stratified meta”, were each type of algo is stuck in it’s “strate”: For instance there could be a “strate” of maze algos rated ~1900, a “strate” of corner ping cannons at ~2000 (I assume here they are beating mazes), … Then any algo losing to mazes would be stuck in another strate around 1800 (or below) even if it could beat the ping cannons.

RegularRyan · November 27, 2018, 5:29pm

@Aeldrexan
It doesn’t seem likely or obvious to me that this stratification would happen. Looking at the database, I can tell you that nearly all matches that are played with an Elo difference over 150 include one algo with an Elo over 2100 or below 900. These matches are pretty rare, accounting for fewer than 2% of games played. I believe the only time such a match should occur is on the top and bottom of bracket, when algos are starved for matches after playing all their peers as KaufK described. Because of this, I don’t think it will have a large impact on the rest of the system.

I do see how ‘strates’ could form as more similar strategies are naturally sorted together, but I don’t think that this change would increase the probability of that happening.

The reason we allow for wider Elo ranges is to ensure that an algo matches more often. For top players, I think match quality is more important than frequency.

The fact that players could be ‘isolated’ in the #1 spot is a concern though. I’ll keep that in mind

The stats here are based on a few quick-and-loose queries and a brief examination of their results

Aeldrexan · November 27, 2018, 7:40pm

@RegularRyan
After reading your answer I now agree that my second point is not too relevant. The only place where your idea could possibly induce a stratification is at the top of the bracket, but that is very uncertain while the problem described by @KauffK is pretty certain.

KauffK · November 28, 2018, 4:52pm

This is now the case once again, congrats. I don’t understand how, considering the problems I’ve described, but you’ve done it again. The only explanation is that your algo just doesn’t lose any new matches. *ominous music plays*

RegularRyan · November 28, 2018, 6:38pm

The problem isn’t quite as bad as this. In the Elo system, it is always advantageous to face algos with a higher Elo than you, and the issue is that there are no algos with an Elo above or around the #1 algo’s elo, leaving them at a disadvantage. However, most of the top 10 algos are in a similar situation with no algos above them that they have not already played, and have a similar disadvantage, making it around as hard for them to overtake #1.

I don’t think this problem is as large as it initially seemed to be, though it definitely does exist and has a higher impact the larger the Elo gap is between the top player and the highest elo player they have yet to match. We have some plans to slightly mitigate the problem, and a plan to give the leaderboard time to stabilize before declaring our top 10 which should also help ensure things are fair. More details to come on all this by December 1.