The Ultimate Copy-Cat - ML Implementations

Isaac · November 19, 2018, 5:35am

So I have personally begun the dive into Machine Learning (ML) with Terminal. It is my opinion that this strategy has taken a back seat since these discussions:
Machine Learning
How does the machine learning portion come into play?
Machine learning libraries

It is my goal with this topic to revamp some of the interest and discussion around ML. Personally, I hope to make ML a valid and viable option that can be successful in the competition.

I’ll begin by discussing a little bit about what I’ve done so far and what I plan to do, and hope other people will share their attempts as well.

My Current Setup:

Currently, I am only using ML for building defenses. I believe I can train a separate network to process attack strategies, but I want to work out what are viable network settings first.

I have started by running a very basic Keras neural network. It accepts the map positions in vector format as well as information about each players health, bits, and cores as inputs. It has several layers scaling up to my desired output vector shape. Essentially, I am running a multi-class classification network where I determine whether each map position should be filled, and with what (static) unit. The difficulty of using Keras is that we cannot use it in the Terminal space. So I save all of the weights from the trained model in JSON format. Then, I created my own .pyd or .so binary file from C code to handle matrix and vector operations. I then only need to load the weights at the beginning setup time and then just run forward propagation every turn.

For those interested, if you manually do forward propagation I believe writing your own binary is well worth it since my algo runs with approximately 60ms per turn with a 5 layer network. I did some tests before I started and these are some results for matrix multiplication (keep in mind this is still highly inefficient C code and my computer is also quite slow since I was using a laptop , also that for the network it is matrix-vector multiplication, which is also much faster):

The default time is using raw python with list comprehension, binary time is using my .pyd (or .so) file.

In this current state, I don’t expect this algo (regardless of how I train the network) to do very well because it is only capable of copying or imitating already existing strategies. Its possible advantage is that when trained on lots of different matches/algos it will apply these different strategies based on the situation. This copying effect becomes apparent when training on a single algo.

As a test case (to make sure it was doing what I expected), I trained a network on 99 matches (or 2258 turns) from the algo KeKroepes_3.27 made by @kkroep. If the network works as expected, after training, every match it played its defense would build to match KeKroepes_3.27. Sure enough, after uploading this algo I found it more or less was getting close. Here is an example match where you can see it sort of matches the actual design. Ignore the attack, since that is just a placeholder for now.

Hence the click-bait name of this topic , I believe if you exclusively trained a network on a single algo’s defense and attack, you could very effectively copy that algo without any hard code, which is an interesting thought. I wouldn’t be too worried about this strategy, however, since I doubt many people will be making ML models to do this and it also will doubtless not work as reliably as a hard-coded algo - but this is still interesting.

After my little test (thank you @kkroep ) I then trained a network on ~11,000 matches (or 319,600 turns) for an extremely short period (only like 10 epochs) just to see what it would spit out. As expected (from the short training time), it was never very confident (confident predictions were typically below .3) but I uploaded it just to see what would happen. Most of the time it loses because it almost always blocks everything, something I believe is caused by its low confidence (eg training time) and thus inability to choose 1 strategy. Thus it tries to follow several and ends up just filling basically everything. However, this match gives me hope that it could learn to actually learn and follow different strategies.

Keep in mind that for all of the matches I have posted from my own algos, I have zero hard-coded information on how to build defenses (as I said, attack is hard-coded and a placeholder). Everything is 100% determined by a ML network.

(In all the matches, my algos are the ones that start with dumb-dumb-ML)

My Plans:

I want to continue to see for a little while if the above approach (copy-cat) can be viable (I am currently training a second version of the 11,000 matches where I fully train it). Viable meaning it can actually commit to and win some matches using a recognizable strategy.

However, I believe this type of learning will always be a little bit behind and not as interesting because it does not actually develop anything new. It will always be limited to understanding matches that have already been played.

Thus, I am very interested in implementing a NEAT approach to this design. In other words, train like I would above, but then run a NEAT algorithm to have it continue to learn dynamically. Or maybe NEAT right away, but this has already been discussed in the topics linked above about the issues regarding training time.

What are other people’s thoughts? Has anyone successfully trained and implemented an algo using ML? What types of networks have people tried? What suggestions do people have for things I could try?

Throw everything out there :).

arnby · November 19, 2018, 7:26am

Great Post!

I worked a bit on a ML approach few weeks ago. I stopped to concentrate on the simulator implementation but here is what I did:

I wanted to try a Q-leraning algorithm. But as we said in previous posts, it is very slow to train live with the game engine. So I pre-trained the state estimator on +30k matches. The job of this NN ( I tried CNN too because the board is very much like an image) is to assess possible next states of the board and say if these states are in favor or not of the algo.

Unfortunately, it did not work very well. The decisions it made were obviously wrong. So I never uploaded it ( and also because I used directly keras so it can’t work on the server without a bit of work).

Few comments on your copy-cat:

If you train on only one algo, I don’t believe this approach will be able to make a first rank algo, because it only learn to do what an other algo do. (Well if you learn from the first you could get first but it is not very interesting)
if you train on several algos, your ML algorithm won’t have a consistent strategy during the match, so it will end up doing nonsense. Maybe you could try to include several turns but then the input dimension explodes. (RNN?)
I think this idea can be of real use to exploit the matches of one algo and polish the defense strategy of this algo. You could train it to copy its strategy but also to take into account the result of this strategy in real matches. This would be for me the best way to use a copy cat.
NEAT seems a good idea there because it can end up better than your training material but the issue is time.

If I had loads of time (which is a fantasy), I have other ideas I would like to try to improve my approach (and maybe try yours also ) but I am not sure I will be able to do that before the end of december. (I just finished a very fast and almost perfect simulator so I have to make some use of it )

kkroep · November 19, 2018, 12:15pm

You are welcome for leaching my strategy . In all seriousness though, awesome post, and good job! I will have to say though, that you picked a very unfortunate algorithm to learn from if you only use firewall placement. The design philosophy behind my algorithms focusses on effective attacking, rather than effective defending. The firewall structure has this specific shape to allow for small changes to result in very different attacks. If you look at the replay you provided of my algo, you’ll see that almost every attack is very effective.

If you strip out the attacking power, you are left with a very mediocre algorithm. In that sense I wouldn’t use my algo as an inspiration. There are some algos that have really strong defensive capabilities. Examples include Aeldrexans AGLOO series, that show some serious adaptability, and I believe the backtostart series also has some interesting ideas for this. I would actaully be very interested what a machine learning algo would pick up from ALGOO, as it is much more difficult to copy then an almost completely static structure.

Good luck man! If you are willing to share I’d be very happy to see some updates on this project.

arnby · November 19, 2018, 6:59pm

Indeed your algorithm has a beautiful attack oriented design! And it seems to be quite effective

Actually don’t pay too much attention to the backToStart series, I kinda knew it but I found out 2 days agos for sure that their algorithm is completly broken (like they think they can jump walls and tons of other funny things). For some reasons, they still manage to do quite well
But a fixed gen is coming and a stronger one after that

C1Junaid · November 20, 2018, 10:21pm

Woah thank you so much for this insightful post! We are analyzing the merits of ML for the game so this post is extremely helpful.

The copy strategy I think will ultimately never work if you use multiple algos because its effectively just an “average” of what algos tend to do given certain situtations. Seeing how it plays is really interesting though, the replay you link shows that the maze-like structure is very popular, as even the average summation of all algos you trained on manage to leave a line open for your units to go through.

What is your ML background btw? Have you used it professionally or just played with it as a hobby? Have you tried stuff like OpenAI Gym?

Anyway I’m not a super expert but here are some approaches I would personally try:

Completely ML Approach: Pick an algo to copy as a base, and then use something like Q learning (or NEAT) or even genetic training to improve upon it. The hard part would be to optimize the random “off policy” moves it can do, essentially the way any of these ML approaches work is you try a random move it wouldn’t normally do and if it does well based on your reward function you increase the likelihood of doing it in a similar situation. Because getting a large number of matches is hard doing just anything random to “explore the action space” is probably not effective so I would try to make it so the “off policy” thing the algo tries is at least a little smart. For example, make sure it doesn’t try illegal moves, or doesn’t try obviously bad moves, but this is tricky. Of course this off policy thing could practically be an algo itself so finding a not too complicated way to try “possibly good” new moves would be nice. Deciding between NEAT, Q Learning, Genetic algos etc, is also tricky. When i was more into ML Q learning was the cutting edge, but I always thought NEAT and genetic algos were underrated as well, not sure what the consensus is now adays I haven’t been keeping up.
Multi-Algo Decider: The opposite end of the spectrum is instead of having the ML part of the algo decide specific firewall placements you could have it decide between a selection of hardcoded algos after a few turns. For example, start with a basic defense, then use ML to decide between which of the X preset algo designs should be chosen based on what configuration the enemy has. This would by far be the easiest to use ML, the output would just be between X possibilities, though it would also require a bunch of hardcodedish algos to use. Example preset algo strats could be: left side ping canon, right side ping canon, EMP line, center of board ping rush etc. Figuring out which one of the subset algos to use is pretty tricky for a regular algo to decide, an ML approach could definitely help in deciding. You could also have it wait to “decide” until the enemy algo commits most of its cores as well. Of course this wouldn’t work well against super dynamic algos that completely change their strategy mid game like Aelgoo but could still do really well probably reach top 10. This algo wouldn’t need to be pretrained on copying a previous algo.
- Heuristics approach: A subset of this approach is instead of having the ML part decide which algo to use, it just decides or predicts features of the game. For example, it could decide is the enemy using a maze or not (and you could train it on algos you know are mazes and some you know that aren’t). Is the enemy attacking the left or right side. Is the enemy a ping canon? Is it better to attack right or left side? Basically you use the ML part just to predict features about the game and then a regular algo uses those features to make decisions.
Hybrid Approach : Inbetween the last two approaches is a hybrid approach. Instead of choosing the exact firewall location, or the opposite of just picking between preselected strategies, the ML part of the algo uses a combination of predetermined structures to build itself to avoid clobbering itself. For example, there are a few “standard” defense shapes, like a destructor with some filters around it to soak up damage, or an EMP line etc. These can be also divided into a few preset locations that are optimal to place them in. For example you may have a optimized defense that is good against EMPs, we can call that substructure A, and another that defends well from PINGs substructure B. Instead of having the ML strategy decide “Put a destructor here” it could decide “put substructure A in position 3” substructure A being forified destructor for example position 3 being the middle left in the front. I imagine it would feel something like how Aelgoo feels like to play against. The downside is it wouldn’t come up with new substructures, but you could train a separate network that just determines substructures or something, one that is even a subcomponent of the main big net even (there are a lot of crazy neural net graphs out there in the research). To be even more ML based you could have it choose more specifically what to do and just have some baseline rules to help guide it like never make a move that completely blocks attack paths or something. For attacking it gets a bit tricky because attack and defense are so linked, but there are all sorts of things you could do it make it work well like only do attack structure A if defense structure B is not used or something (though hopefully the ML also learns to do this on its own).
- Pretraining: This algo also wouldn’t need to be pretrained on a copy but there are a lot of weird pretraining techniques that could be useful, for example the one where you remove the last few layers and replace them with a few layers that output something different like the substructure choices above and retrain thereby keeping some of the basic pattern recognition that the bot you are copying from has (but using those heuristics to output higher level substructure suggestions instead of specific locations). For example you could copy aelgoo, then remove the last few layers and have it output substructures instead, whatever aelgoo is using to detect where to defend will be kept in the first few layers logic, but the output would be retrained to use your substructures. Of course, the pretraining that I’m describing is kind of an old technique and your mileage may vary, sometimes it doesn’t really help at all but sometimes its very useful like with autoencoders.

Isaac · November 20, 2018, 11:42pm

Thanks for the comments!

@arnby regarding following a consistent strategy, I think an excellent solution (on paper) is your idea of an RNN. Certainly, this makes sense given how each state of the game board is dependent on the turns before it. This would help a lot with making it stick to one strategy and also recognizing that strategy as independent from others. I thought I would try and tackle NEAT next since I’m just interested in the algorithm itself and haven’t played with it, but with some preliminary work already done towards NEAT, I can say I think it’s possible but may be stripped to a point where it is not functional and is going to take quite a bit of work. I’ll talk more about NEAT below. But given how NEAT provides some quite difficult challenges and an RNN will be (fairly) simple for me to convert my current setup I’ll definitely give that a try.

@kkroep, I definitely realize in hindsight what you are saying about your algo being a poor candidate since it focuses on the attack. However, I simply wanted to see if a recreation would work and I randomly picked a top algo to mine data, but the idea of going after @Aeldrexan’s AGLOO is an excellent one and it would be very interesting to see the results, I’ll definitely give it a test run in a little while out of curiosity. (Assuming @Aeldrexan doesn’t have any complaints - speak now or forever hold your peace ).

@C1Junaid, wow, some awesome ideas presented in your reply. My goal is to make whatever ML implementation as pure as possible. What I mean by this is that I agree, a Multi-Algo Decider is probably a much easier implementation, but I hope to be able to train some sort of network to think without any static elements (I am not including things like pre-training and substructures in this). My general process (I believe) will be to start by giving as little guidance as possible, and slowly work backward in terms of the amount of static instruction regarding the strategy.

I noticed two common themes regarding your list, and I feel ML can be used to benefit an algo in two main ways. It can make either make the decisions, or provide information (like what strategy to use) that help an already built algo make decisions. I think the latter here is a “safer” and probably more reliable decision, which is naturally why I want to try the former first :).

That being said, I will be trying many different approaches and will draw from both sides of this equation. I have never fully done machine learning before so I am looking to learn lots of different implementations. I have followed ML for quite a while and did things like build my own (very basic) neural network that did forward and backward propagation from scratch in C++. I have looked at OpenAI gym before, but haven’t done anything yet. Every time I started working on something other things have taken precedence and time has eluded me, so Terminal provides an excellent opportunity for me to learn about this :).

Some thoughts about NEAT and genetic algorithms:

I began some preliminary work setting up a NEAT algo and it became very clear that the size of each genome using the raw board as inputs and outputs is simply too large (this is exaggerated by the inefficiency of python - which could be offset in C but I’m not ready to spend that much time working on that yet). There are also some issues regarding training in parallel since I obviously don’t want to create a new folder for every single genome. Instead, I have a single folder that references a pickled Genome object that it will use for its logic. Then it gets which pickled object it should use from the (slightly modified) run_arena.py script. This leads to race conditions when creating the algos, but this is pretty easily solved (thought I’d mention it anyway though).

The big issue is simply the number of inputs and outputs. If you have each board position and health, cores, and bits for each player as inputs, there are 427 (one for bias) inputs. Then, you can expect a similar number of outputs, or possibly more depending on your implementation. This is nothing, small in fact, for a standard NN model, but using an inefficient language (like I said, not doing this in C) where you need to create a new population every epoch with new mutations, this is extremely inefficient and slow.

The obvious solution (which is very common) is to simply reduce the number. This is fairly easy for the inputs since you can simply package the data differently (an analogy would be making an image more pixelated) without really changing the information the network receives. Where I am struggling to find a solution is reducing the number of outputs. This is, of course, assuming an implementation that makes a base. It is much more difficult to take a pixelated image and make it high resolution. Similarly, it is challenging to make the number of outputs give more data without actually creating more nodes. The current idea I am toying with is some sort of hashing function, but if anyone has any thoughts I’d love to hear them. The other issue with the number of inputs/outputs, of course, is just training time - it will take much much longer to learn with more initial nodes.

All of the above problems (which really all come down to training time) make me hesitant to try a NEAT implementation without at least trying the RNN idea first (and the little test with ALGOO).

Little update:

Regarding the training, I mentioned in my first comment (the 11,000 matches): it experiences the same issue observed by everyone thus far regarding committing to a strategy. It is now very confident in its decisions it makes (>.98), but it still chooses everything. This more than anything makes me want to try an RNN, but if anyone has other suggestions or reasons I shouldn’t just let me know.

Thanks!

RedRyan2.0 · November 25, 2018, 11:36pm

I’m currently using pytorch and haven’t had a bot do anything of merit yet, but I’ve somewhat dealt with the issue of the large input (not entirely though). Neural networks work best with one-hot encoded input data. This increases the input data if you want to use machine learning, but there are a few strategies that i’ve used.

Architecture is VERY important with ML algos.
Before I was thinking about architecture weights and just purely thinking about what would be nice if trained properly, I ended with a network that had a little over a million weights. (YIKES!) , so think carefully about the structure of your NN.
Use a CNN to reduce number of weights in a meaningful-ish way.
Using a CNN of height and length 2 with a stride of 2 should reduce input space by 75 percent. The amount of weights used in a CNN is worth it to reduce the input space.
Use NN’s to reduce input space. Implementing an autoencoder will help reduce input space if trained correctly. But with NN’s their are a lot of weights used to reduce the input space. So this is a secondary option that may have to be used either way.
Use the least-bulky RNN possible. In this scenario, I’d recommend either vanilla RNN’s or a GRU. The bulkyness of LSTM’s just aren’t worth it. RNN’s tend to use a lot of weights. If there are other RNN types that have about or less weights needed per input than a GRU cell, then use it.

This is everything I can think of right now. Good luck to the rest of the ML algo coders!

Isaac · November 28, 2018, 9:14am

I haven’t had a lot of time lately to continue work on this, but I have done a little bit:

I have begun some preliminary work (honestly, mostly done - it went way faster than I expected) regarding classification of algos.

What I mean by this is taking a current snapshot of the board (looking at one side, or algo) and attempting to identify what sort of strategy it is using. The obvious hiccup is that for a ML approach you’d typically have labels to tell it what is right/wrong. But, if you had this you wouldn’t need the ML… So I chose to use unsupervised learning to identify strategies. Essentially, I made it a k-means clustering problem.

Although I have just started this, it is pretty clear that it works pretty well. Lets just say that I have found a lot of maze algos…

My thought is this information can be used in a couple of different ways. First, I think it would be interesting to train different neural networks to counter only one specific strategy, and then choose the NN based on the cluster classification. My belief is this would help with choosing and sticking to a strategy problem I have been facing thus far, since the NN wouldn’t have to make that decision. A simpler use case would be to identify the strategies by watching games (which I plan on doing regardless) and hard code a counter-strategy.

The main challenge with clustering is determining its accuracy based on the k-value (number of clusters). I am not sure what an optimal k-value would be - and it is an extremely important number. A basic approach would be to simply try a bunch, which is fine because training is actually quite quick, but the problem is then determining a k-values accuracy. If anyone has any thoughts on determining an optimal k-value or how to judge a clusters accuracy I would love to hear them. For the moment though, I’ll just upload an algo that builds a message (abbreviated) similar to @RegularRyan’s fee-fi-fo-fum idea to tell me what it thinks the opponents strategy is, so if you face a messaging algo, thats me :).

Thanks for all the comments thus far, they’ve all been awesome!

Ryan_Draves · February 1, 2019, 3:05am

Alright, I got my first ML algo (and ML project in general) up and running locally, which is pretty exciting. I trained it on ~2000 replays, although it’s clear that it’s confined to copying Demux-like algos due to the hard coded round 0/1 I provide it.

The algo is designed to just provide to me the defensive placements that it thinks would be best. I work the rest of the magic to place the best attack, so it’s certainly not operating “all on its own.”

Here’s the first game, which I think is pretty successful and shows it able to express ideas related to how to add on to Demux: second_game.replay (5.8 MB)

Here’s another, in which I don’t give it the first two rounds to give it structure. It’s a hot mess and seems to want to copy some dynamic algos briefly before it spirals out of control: third_game_organic_placements.replay (2.1 MB)

Overall I think the algo works pretty well. It’s certain a bit “unfair” to pure ML algos to hand mine optimized attacks and a well defined structure, but maybe later on I’ll get it to operate a bit more autonomously. I’ll see what I can do about getting this running live, but I suspect the dependency workaround will not install in time to be functional. I might need to wait it out for Python dependencies to be supported, unless I figure out how to make my own forward propagation. Update: “Unable to find ‘keras’” when I use the workaround…

From here I might just commit it to learning from Demux-algos. Ideally I’d like to set up some kind of self-reinforcement environment, but with the struggle of setting this up that might take a while…

Ryan_Draves · February 2, 2019, 9:52pm

Excitingly, I got “ML_Bootstrap” up and running on the live servers using some workarounds. It’s no champ, but it’s competing alright. The algo ID is 56279 for those that want to see it derp around and place weird things.

Here’s an alternative method to getting a Keras model running on the live servers without needing Keras, Tensorflow, or Numpy as dependencies:

Using this repo, get your Keras model running locally on C++
Using your choice of Python-C++ interfacing, (I used this method), get your C++ model back into your Python algo
Spend many hours debugging

A couple notes on debugging this: std::cout will mess up the game engine, but you can replace those with std::cerr. Also, the JSON and H5 to plain text conversion is outdated. Line 30 of dump_to_simple_cpp.py should read arch["config"]["layers"]. Everything else should be regular debugging. There may have been some other fixes I had to make, so if anyone’s trying to attempt this and is having immense difficulty, I can fork the repo and upload a fixed version.

This upside to this method over the other discussed ones is that you can get your Keras model up and running without figuring out how to load a model and its weights while also avoiding implementing a forward propagation method. The downside, of course, is you’re debugging a slightly broken library, and that’s no fun.

kkroep · February 3, 2019, 9:53am

Thanks for your open post. I was not really developing for the past month so I am checking back now to find that a lot of interesting stuff is happening. There seem to be quite a few algos around that use a base structure that is inspired by Demux, but I can tell that under the hood a much more interesting algos are hiding than an attempt at a replica. This is great, because now I can also learn a lot from you guys .

So now the interesting question is, is the ML algorithm able to come up with decisions that are surprisingly intelligent, or is that still hard to achieve? I would be extremely impressed if this ML type approach would be able to formulate attack strategies that beat (my) hand crafted ones. I would expect that constructing multi-turn strategies is extremely difficult to achieve.

Ryan_Draves · February 3, 2019, 4:55pm

The model is designed to give me a set of “optimal” defensive places, given an inputted game state. What I do with those placements and what kind of game state I pass into it (relative to that turn’s game state) are the result of my handcrafted framework, so there’s not a lot of room for “expressing” ideas both ways (the handcrafted logic is muddled by the less-than-perfect defensive placements, and the defensive placements are muddled by an imperfect model of the game state being passed into it).

That being said, you can see the algo expressing some ideas. A couple from the replay I provided:

Stuff destructors in the corner to stop the corner cannon
Replace firewalls that have been destroyed from the main structure
Attempt to express an attack strategy by building out the middle rows of filters

The third bullet point is obviously the weakest, although that’s likely the fault of my approach. I don’t feed it how many cores it has to spend, nor does it give me the entire structure it wants to place (just that turn’s placements). That’s more of an issue with my model than something ML can/can’t do (I think a better model could express more complete attack strategies, or perhaps better logic that buffers placements when there aren’t enough cores for all of them).

A note on the second one: This one was the most surprising idea it expressed. The removal logic is not part of the neural net, that’s being done by hand. Deciding to replace things that have been removed/destroyed is quite interesting to see, particularly since it’s giving me that turn’s placements, not a complete structure. I suppose that replacing firewalls in the core part of the structure is just standard behavior it was able to mimic.

C1Junaid · February 4, 2019, 5:28pm

Firstly super awesome to see ML algos, thanks for experimenting with them!

So, what exactly did you train your algo on? Any replay or ones with demux like strats only? How did you judge what the desired output should be (assuming this is supervised learning). Was it something like take a gamestate, look at what firewalls were placed, and if the turn was “good” (enemy deployed units but they were properly defended against, player health didn’t go down) then you train on that turn?

Ryan_Draves · February 4, 2019, 6:18pm

Thanks, I had a lot of fun getting it working. It was definitely a huge learning curve (not that I made it very far into the curve).

So, what exactly did you train your algo on?

This was a detail I initially left out on purpose, but I’ll handle the followups with care.

Any replay or ones with demux like strats only?

I downloaded 100 replays from the top 20 algos as of 1-26. The Demux-specific behavior that can be seen is because the algo has a hardcoded turn 0/1, but it “technically” could mimic any other basic strategy it trained in if I supplied a different initial turn. If I make another version with mimicking behavior, I’ll probably just download Demux-esq replays.

How did you judge what the desired output should be (assuming this is supervised learning). Was it something like take a gamestate, look at what firewalls were placed, and if the turn was “good” (enemy deployed units but they were properly defended against, player health didn’t go down) then you train on that turn?

I’m not sure how to write my own loss function, so instead I calculated this beforehand. I iterated over each turn of each replay to evaluate whether or not a turn was “good,” then added this to the dataset if it was “good.” The neural net was then simply trained to obtain outputs that mirror those “good” placements for a given game state, which makes it more of a mimicking algo than something that has the capacity to “invent” new ways to play (something I want to explore later, but would take a lot more work).

The metric for “goodness” is closely tied to my regular algo, so I’ll leave that out.

Edit: Important note on iterating over the replays: Technically one could compare between two turns generated from the replay file to define “goodness” without the need for a simulator at all, much less a game engine.

arnby · November 29, 2019, 9:08am

I recently did some work on copy-cat ML algos.

Model

I trained a model to place both attacks and defense according to the 35 last versions of the eagle algo.
It accounts for about 800k placements to learn.

I choose to take several versions of an algo, in hope that it will add some noise to the learning material and refrain the model from overfitting (all my ML tries on a single algo overfitted too fast). But I am aware that it will be harder for it to learn a consistent strategy. The assumption is that the versions do not differ that much.

I uploaded 4 versions of this model, at different learning stage (the number at the end is supposed to be the accuracy test on 1000):

Results

It did learn few things:

the general structure of an eagle algo
the idea of opening and closing paths of attacks to defend and create opportunities
several attacks patterns from eagle:
* turn 10 it closes every path to make a corner attack
* turn 7 it saw a pretty good opportunity for an EMP attack

what can be improved:

I currently give it only the board and the players stats. It has absolutely no clue about the attacks the opponent did or where it scored / got scored.
Use a CNN instead of a flat NN
Overall accuracy: it still does some silly moves , like attacking while closing every paths, or not spending cores after receiving a huge blow. (the algos over around a rating of 1400 )
Training material: I am afraid that the test and validation sets are leaching because of repeating patterns among the opponents.

Anyway, thanks to @Felix for this opportunity!

MrTwiggy · November 29, 2019, 6:20pm

Very cool stuff, arnby! Interesting to see your progress with it. Out of curiosity, did you ever perform an experiment to see how the algorithms performance scales as you increase the amount of data? (And side note, are you re-training the final model on train+valid+test sets?)

For example, take 100k turn frames, train a model on this, take 200k turns and train a model on this, etc. Then, compare the win rates amongst the different models (relative to each other or some baseline like the eagle model), so that you can try to estimate how the performance is scaling in response to the data available, and might tell you whether getting more data is likely to significant boost the algo’s capabilities.

Either way, awesome stuff and thanks for sharing!

Felix · November 29, 2019, 8:15pm

Well done @arnby!

I’m really impressed by your ml-clones from my algos .

Some other really cool matches I noticed from your ML-algos:

And I’m mostly impressed, that your algos wait sometimes many turns before they do their next attack because I’ve seent that your first ML-algos launched attacks every turn.

In the second match, in rounds 6 to 8 your algo entered a situation, where your algo would not be able to score anymore, if the opponent didn’t attacked themselve.
This problem is fixed in the original EAGLE by also using information from previous turns and action-phases.
I think, that this information could also benefit your algos.

One of the hardest tasks for ML when copying my algo is probably to learn to open a path and then attack there. Right now your algos seem to replicate my actions without really understanding the that the attack does nothing, when there is no opening. That’s why I think that it can perform well against strategies it learned from but will be bad against new ones it had never seen before.

I look foreward to seeing more improvements of your ML-algos.

arnby · November 30, 2019, 7:16am

Aside from learning from one versus from 35 algos, I did not. I initially trained on 40 eagle algos, and then dumped the first five because the model learned less successfully on them

No, I don’t, would you? How do you decide then when to stop the learning process? (aka overfitting).

Thanks @Felix, These two are pretty cool!

And what a surprise: I just discovered that ML_Eagle_765 has a rating of 2075
I admit that after watching the first few matches, I thought they would never climb the ladder.
I think ML_Eagle_765 got lucky and finally managed to be match up against opponent it had learned from, hence the climb.

And it is also quite unfamiliar with getting destroyed: I noticed that it didn’t how to react when a large chunk of the base was gone, even so it had the cores to rebuild it.

I will try some variations and if there is some interesting stuff, I will post updates here.

IWannaWin · November 30, 2019, 10:38am

RIP, it would have beaten me if not for one big mistake:
https://terminal.c1games.com/watch/5307823

The removals of that algo really confuse my algo…
Either way, nice work @arnby

arnby · December 1, 2019, 6:49am

@Felix, are talking about the RL algos? I thought they would go unnoticed

There is still lot of rooms for improvement yes

btw is you algo dynamic? By the weirdness of the structure it seems so, or at least on a certain level?