Match of the Week - Revisited

For those who have been here for a while, you might remember 8’s (a user who has left us) Match of the Week posts where he would find and analyze what he considered to be top matches for that week, either by large elo (I still call the score that even though the system has changed out of habit) difference or a very close game, or just an interesting strategy.

I really enjoyed these little summaries and miss being able to see interesting games. So, in an effort to (sort of) recreate this, I have included a link on the terminal tools page that will take you to a highlighted match that I will update ~daily.

I wish I had the time to do this properly, and write up nice summaries every single week, but I just don’t have the time to watch and filter through so many matches every single day/week. Instead, to make this more of a programming project for myself :slight_smile: and feasible, I have created a script which will look at the most recent matches played from the server and assign each replay a score. The match with the highest score is then chosen as the highlight match.

I have open sourced this project, which can be found here. However, let me be clear about the nature of the open sourced project. You will not be able to just clone and run this script. I have intentionally left out (see .gitignore) a couple of scripts (3 for automatically updating the webpage) and one script p_lib.py is necessary for getting the raw data of a match from the server, which I think should only be accessible for those who have looked for and found it. The primary purpose of me open sourcing this is to get feedback and thoughts/suggestions about the scoring method, which is in the Replay load_data function.

Currently (will be changed shortly), a replay is scored by the maximum number of units created and the difference in health at the end of the game. This pretty much will always just find games that go to 100 rounds, and I never intended to have this actually be the method; I just wanted to get the system up and working. I have some thoughts for programmatically identifying matches that would be interesting to watch, but I’m curious to see what thoughts you guys might have.

Let me know what you guys think.

6 Likes

A quick way to score matches (having not looked at what numbers are available to get out of each replay) would be to raise to the score for being in an optimal range (25-40 turns), for being in a high elo (perhaps the sum of each player’s elo), and maybe some additional score for having an upset (helping to balance the elo part of the scoring with interesting matches that feature upsets).

I find trying to describe what should be seen as an “interesting” match really… interesting !

Here are my first ideas:

  • Higher rated algo are better to highlight, but I would go for the maximum of the 2 elo rather than their sum:
  • Upset (lowly rated algo beating highly rated algo) are often interesting, so I would have (loserElo - winnerElo) as an important factor of the score,

Harder to implement: a game that is going back and forth with several reversals of situation and turnarounds is “interesting” in my opinion. Here is my proposition for detecting this:
Step 1: Using a lot of replays, train a neural network to predict the winner of a game based on a game-state.
Now for each replay:
Step 2: Have the graph of the predictions and confidence of that neural network over the turns.
Step 3: Get the confidence maximum of each consecutive set of turns where the NN predict the same outcome.
Step 4: Sum those confidence maximums (or a well chosen function of the confidence maximums)

If the result of that computation is high, it should mean that, according the neural network, each algo looked clearly winning, then clearly loosing, and this several times across the game.

2 Likes

Alternatively, instead of using a neural network to predict winners, just use cores on board and lives remaining. Maybe not as accurate, but certainly easier and faster.

Also, if this is all being automated anyway, you could have a list of the best matches instead of just one.

4 Likes

Great ideas! I personally like the definition that an interesting game is:

But I don’t want to really work on a ML approach for this (for now). I like the idea of using

However, I think I can do a bit more accurate than just this since I can look at units per turn, if they are the same, if a lot of destruction happened, who is ahead in health, how many times this flips, etc. So I’ll give that a go.

I also really like the suggestion of using a list of matches since yes, it is automated.

Personally, I wanted to stay away from filtering by elo for two reasons. First, there are plenty of interesting games that occur at lower score levels and part of this is that I want to show everyone’s achievements, not just the top few (which most people have already seen a lot of anyway). Second, I would rather check all the algos equally, and come up with an algorithm that will not choose way lower algos because they will not be considered “interesting” if they don’t do any interesting strategies.

Lastly, I forgot to mention in my previous post but the primary reason I am doing this autonomously is that there isn’t time to look through all of the matches properly. However, the information is updated manually (although I have automated the process).

If you personally see a really cool/interesting game just message me and I’ll add it.

1 Like

Just throwing some ideas in quickly, since I have other things to do.

The two main advantages in the game is health advantage and core advantage. What we would want to find is games where the advantaged player changed often, or there was a large swing. My first thought for a relatively simple implementation would be…

  • Take some stat, like cores, and make 2 lists, one for each player.
  • Subtract them to make one list, where positive numbers indicate P1 is winning (has advantage) and negative numbers indicate P2 is has advantage .
  • Create some system to score this list. Most simply, you could count how many times advantage changed. This doesn’t count how big the swing was, so I would measure the difference between peaks and valleys.
  • Normalize this scoring so that health/core advantages are both important
  • Give more points to the replay if the player with core advantage lost

Ex: [0, 3, 3, -4, -2, 2, 5, 7, -10] P1 wins is a much more interesting game then [0, 3, 5, 6, 10, 15, 7, 13] P1 wins

One issue with any naive system is that it will disproportionately choose certain types of games, like this system would choose the ‘swingiest’ game and your current one chooses the longest game. One solution would be to have a couple of different rating systems and choose a random one when weighting each replay, or something like that which ensured different ‘types’ of games were chosen.

Spent longer on this then I meant to… anyway, good luck, seems like a really interesting challange!

I have tried to train a perceptron (ie a single neuron) taking, for each players, the following values as inputs:

  • health
  • saved cores
  • saved bits
  • number of filters
  • number of encryptors
  • number of destructors

I trained this perceptron on 800+ replays of games between top algos (originally I had downloaded them to make a better prediction phase)
Training it only on top algos was a mistake as I will explain later.

I then used this perceptron in an algorithm similar to the one I described in my previous post (I summed the squared distances of prediction extremums) to rate “interestingness” of replays #2209300 to #2209400

Here is the top 5:
#1: 2209381
#2: 2209355
#3: 2209399
#4: 2209321
#5: 2209380

While the 3rd one could indeed match my definition of “interesting” (early game going back and forth, then player 2 having a huge core advantage around turn 20, but finally lost), the other are mainly about very low level algos and don’t seem that interesting.

To understand that, let’s look at the weights I obtained after the training:

  • health: 0.212429
  • cores saved: 0.124875
  • bits saved: 0.450969
  • filters: 0.12725
  • encryptors: 0.794231
  • destructors: 0.539732

The weights for numbers of encryptors and destructors are high but when you divise it by their cost it is just about 1.5 as important as filters.

The weight that is the more spectacular in my opinion is the one relative to the bits saved.
And I think it is not a mistake from my perceptron when evaluating win probability in high level game:
At this level, big number of bits saved up means that there will soon be a devastating attack that could literally end the game, or at least destroy a lot of firewalls.

But at low level, it is not the case: starter-algo saves bits to launch very ineffective attacks, and some even worse algos just fail to attack.
Even core on board is not too relevant at low level, only health is very relevant.

So the predictions resulting from my perceptron are useless when evaluating outcome probability of low level matches

And so I think that whatever thing we would try to use as an advantage measurement should be made relatively to the level range of the match.

1 Like

Thanks for the ideas :).

@RegularRyan I especially like the idea implementing multiple rating systems and choosing between them, or as @Thorn0906 pointed out, since this is all automatic, I could upload matches for each system independently, etc.

@Aeldrexan, cool stuff! I wanted to clarify, when you say

do you mean the elo of each algo, or to normalize the score influence based upon the turn number? I don’t think I’ll need to take into account elo of any algos simple because a “good” rating system shouldn’t allow a poor algo to be shown regardless. I think @RegularRyan’s implementation here is good since not very good algo’s won’t have many swings during matches, etc. I do like the idea of normalizing based on turn number since otherwise there could be a bias towards matches that go on a long time again. It could exclusively show algo’s that just save up a bunch and then win in one rush. This is certainly a strategy that should be shown, but not exclusively.

I guess that is the real challenge of this, it is easy to find a scoring system, but it is hard to make a properly generalized one, which is again why I like @RegularRyan’s idea of using multiple.

I’ve gotten quite busy recently but I should have some time after Wednesday to work on it and hopefully get a better system out :).

1 Like

What I meant was to use a different model to describe advantages, depending of the average elo of algos.

This is probably not true.

It will be really hard to do (if you don’t want to directly punish low elo in your rating system)
For instance let’s analyze this match of starter-algo vs starter-algo:

  • According my perceptron, it did go back and forth a lot.
  • If you look at the core advantage, it did go back and forth.
  • But if you look only at player’s health, this game was just red winning from the beginning to the end.

But using only a model based mostly on player health when rating matches at higher level, you would end up
highlighting matches that where also completely straightforward but simply core-oriented.

So the idea I tried to said in my previous post was to make a different model for each elo range (maybe by training a perceptron for each elo range).

But now I think it would be better to just uses all the advantage models at the same time and consider that the game is interesting only if it is interesting from every or many models point of view.

I would hypothesize that the opposite is true. High level algos are very consistent, and wont give an advantage away easily. Once a swing happens even once in such a game it is probably really interesting. Quite possibly a different algorithm found a small gap in the strategy it could exploit.

This type of behavior is not at all what I observe in lower elo ranges.