The Ultimate Copy-Cat - ML Implementations

Ryan_Draves · February 2, 2019, 9:52pm

Excitingly, I got “ML_Bootstrap” up and running on the live servers using some workarounds. It’s no champ, but it’s competing alright. The algo ID is 56279 for those that want to see it derp around and place weird things.

Here’s an alternative method to getting a Keras model running on the live servers without needing Keras, Tensorflow, or Numpy as dependencies:

Using this repo, get your Keras model running locally on C++
Using your choice of Python-C++ interfacing, (I used this method), get your C++ model back into your Python algo
Spend many hours debugging

A couple notes on debugging this: std::cout will mess up the game engine, but you can replace those with std::cerr. Also, the JSON and H5 to plain text conversion is outdated. Line 30 of dump_to_simple_cpp.py should read arch["config"]["layers"]. Everything else should be regular debugging. There may have been some other fixes I had to make, so if anyone’s trying to attempt this and is having immense difficulty, I can fork the repo and upload a fixed version.

This upside to this method over the other discussed ones is that you can get your Keras model up and running without figuring out how to load a model and its weights while also avoiding implementing a forward propagation method. The downside, of course, is you’re debugging a slightly broken library, and that’s no fun.

kkroep · February 3, 2019, 9:53am

Thanks for your open post. I was not really developing for the past month so I am checking back now to find that a lot of interesting stuff is happening. There seem to be quite a few algos around that use a base structure that is inspired by Demux, but I can tell that under the hood a much more interesting algos are hiding than an attempt at a replica. This is great, because now I can also learn a lot from you guys .

So now the interesting question is, is the ML algorithm able to come up with decisions that are surprisingly intelligent, or is that still hard to achieve? I would be extremely impressed if this ML type approach would be able to formulate attack strategies that beat (my) hand crafted ones. I would expect that constructing multi-turn strategies is extremely difficult to achieve.

Ryan_Draves · February 3, 2019, 4:55pm

The model is designed to give me a set of “optimal” defensive places, given an inputted game state. What I do with those placements and what kind of game state I pass into it (relative to that turn’s game state) are the result of my handcrafted framework, so there’s not a lot of room for “expressing” ideas both ways (the handcrafted logic is muddled by the less-than-perfect defensive placements, and the defensive placements are muddled by an imperfect model of the game state being passed into it).

That being said, you can see the algo expressing some ideas. A couple from the replay I provided:

Stuff destructors in the corner to stop the corner cannon
Replace firewalls that have been destroyed from the main structure
Attempt to express an attack strategy by building out the middle rows of filters

The third bullet point is obviously the weakest, although that’s likely the fault of my approach. I don’t feed it how many cores it has to spend, nor does it give me the entire structure it wants to place (just that turn’s placements). That’s more of an issue with my model than something ML can/can’t do (I think a better model could express more complete attack strategies, or perhaps better logic that buffers placements when there aren’t enough cores for all of them).

A note on the second one: This one was the most surprising idea it expressed. The removal logic is not part of the neural net, that’s being done by hand. Deciding to replace things that have been removed/destroyed is quite interesting to see, particularly since it’s giving me that turn’s placements, not a complete structure. I suppose that replacing firewalls in the core part of the structure is just standard behavior it was able to mimic.

C1Junaid · February 4, 2019, 5:28pm

Firstly super awesome to see ML algos, thanks for experimenting with them!

So, what exactly did you train your algo on? Any replay or ones with demux like strats only? How did you judge what the desired output should be (assuming this is supervised learning). Was it something like take a gamestate, look at what firewalls were placed, and if the turn was “good” (enemy deployed units but they were properly defended against, player health didn’t go down) then you train on that turn?

Ryan_Draves · February 4, 2019, 6:18pm

Thanks, I had a lot of fun getting it working. It was definitely a huge learning curve (not that I made it very far into the curve).

So, what exactly did you train your algo on?

This was a detail I initially left out on purpose, but I’ll handle the followups with care.

Any replay or ones with demux like strats only?

I downloaded 100 replays from the top 20 algos as of 1-26. The Demux-specific behavior that can be seen is because the algo has a hardcoded turn 0/1, but it “technically” could mimic any other basic strategy it trained in if I supplied a different initial turn. If I make another version with mimicking behavior, I’ll probably just download Demux-esq replays.

How did you judge what the desired output should be (assuming this is supervised learning). Was it something like take a gamestate, look at what firewalls were placed, and if the turn was “good” (enemy deployed units but they were properly defended against, player health didn’t go down) then you train on that turn?

I’m not sure how to write my own loss function, so instead I calculated this beforehand. I iterated over each turn of each replay to evaluate whether or not a turn was “good,” then added this to the dataset if it was “good.” The neural net was then simply trained to obtain outputs that mirror those “good” placements for a given game state, which makes it more of a mimicking algo than something that has the capacity to “invent” new ways to play (something I want to explore later, but would take a lot more work).

The metric for “goodness” is closely tied to my regular algo, so I’ll leave that out.

Edit: Important note on iterating over the replays: Technically one could compare between two turns generated from the replay file to define “goodness” without the need for a simulator at all, much less a game engine.

arnby · November 29, 2019, 9:08am

I recently did some work on copy-cat ML algos.

Model

I trained a model to place both attacks and defense according to the 35 last versions of the eagle algo.
It accounts for about 800k placements to learn.

I choose to take several versions of an algo, in hope that it will add some noise to the learning material and refrain the model from overfitting (all my ML tries on a single algo overfitted too fast). But I am aware that it will be harder for it to learn a consistent strategy. The assumption is that the versions do not differ that much.

I uploaded 4 versions of this model, at different learning stage (the number at the end is supposed to be the accuracy test on 1000):

Results

It did learn few things:

the general structure of an eagle algo
the idea of opening and closing paths of attacks to defend and create opportunities
several attacks patterns from eagle:
* turn 10 it closes every path to make a corner attack
* turn 7 it saw a pretty good opportunity for an EMP attack

what can be improved:

I currently give it only the board and the players stats. It has absolutely no clue about the attacks the opponent did or where it scored / got scored.
Use a CNN instead of a flat NN
Overall accuracy: it still does some silly moves , like attacking while closing every paths, or not spending cores after receiving a huge blow. (the algos over around a rating of 1400 )
Training material: I am afraid that the test and validation sets are leaching because of repeating patterns among the opponents.

Anyway, thanks to @Felix for this opportunity!

MrTwiggy · November 29, 2019, 6:20pm

Very cool stuff, arnby! Interesting to see your progress with it. Out of curiosity, did you ever perform an experiment to see how the algorithms performance scales as you increase the amount of data? (And side note, are you re-training the final model on train+valid+test sets?)

For example, take 100k turn frames, train a model on this, take 200k turns and train a model on this, etc. Then, compare the win rates amongst the different models (relative to each other or some baseline like the eagle model), so that you can try to estimate how the performance is scaling in response to the data available, and might tell you whether getting more data is likely to significant boost the algo’s capabilities.

Either way, awesome stuff and thanks for sharing!

Felix · November 29, 2019, 8:15pm

Well done @arnby!

I’m really impressed by your ml-clones from my algos .

Some other really cool matches I noticed from your ML-algos:

And I’m mostly impressed, that your algos wait sometimes many turns before they do their next attack because I’ve seent that your first ML-algos launched attacks every turn.

In the second match, in rounds 6 to 8 your algo entered a situation, where your algo would not be able to score anymore, if the opponent didn’t attacked themselve.
This problem is fixed in the original EAGLE by also using information from previous turns and action-phases.
I think, that this information could also benefit your algos.

One of the hardest tasks for ML when copying my algo is probably to learn to open a path and then attack there. Right now your algos seem to replicate my actions without really understanding the that the attack does nothing, when there is no opening. That’s why I think that it can perform well against strategies it learned from but will be bad against new ones it had never seen before.

I look foreward to seeing more improvements of your ML-algos.

arnby · November 30, 2019, 7:16am

Aside from learning from one versus from 35 algos, I did not. I initially trained on 40 eagle algos, and then dumped the first five because the model learned less successfully on them

No, I don’t, would you? How do you decide then when to stop the learning process? (aka overfitting).

Thanks @Felix, These two are pretty cool!

And what a surprise: I just discovered that ML_Eagle_765 has a rating of 2075
I admit that after watching the first few matches, I thought they would never climb the ladder.
I think ML_Eagle_765 got lucky and finally managed to be match up against opponent it had learned from, hence the climb.

And it is also quite unfamiliar with getting destroyed: I noticed that it didn’t how to react when a large chunk of the base was gone, even so it had the cores to rebuild it.

I will try some variations and if there is some interesting stuff, I will post updates here.

IWannaWin · November 30, 2019, 10:38am

RIP, it would have beaten me if not for one big mistake:
https://terminal.c1games.com/watch/5307823

The removals of that algo really confuse my algo…
Either way, nice work @arnby

arnby · December 1, 2019, 6:49am

@Felix, are talking about the RL algos? I thought they would go unnoticed

There is still lot of rooms for improvement yes

btw is you algo dynamic? By the weirdness of the structure it seems so, or at least on a certain level?

Felix · December 1, 2019, 7:00am

@arnby, yes I ment your RL-algos.

With this tool (thread) I can find everything. Even if it is very far away from the top 10
(hint: it’s currently not possible to find your RL-algos, because I reduced the number of algos that are getting loaded to increase the loading speed.)

IWannaWin · December 1, 2019, 7:04am

Yeah, it’s very dynamic, I was getting bored of just making static structure algos.

kkroep · December 1, 2019, 1:08pm

I see your point, but I laughed out loud when I realized all my algos have been simple finite state machines with a bunch of if statements…

I find debugging and refining a strategy to be very easy when you know exactly what piece of code builds what structure, so I usually try to get as far as I can with simple code, and add in more complex logic later on when I hit a wall

MrTwiggy · December 7, 2019, 2:32am

That’s fair, it might be an interesting experiment to try and compare the winrate of a bot trained with 100% of match data to the winrate of a bot trained with 50% matches worth of data. If it’s significantly better, could be a good indication there is still a lot to squeeze out by just collecting more data.

Additionally, just a thought, but something I’ve used in the past on another game (generals.io) that improved my initial behavioral cloning approach (copy-cat approach) was to move beyond mimicking a single particular algorithm (which was my first thought too). Instead, what I did was I took a huge number of match data from the top X players, and modified the loss function by scaling it with the match outcome (-1 for loss, 1 for win).

So for example, let’s say I have match data for player A vs player B, which consisted of 250 turns worth of data (one turn for each player, with a frame of history and the actions they took). Suppose Player A wins the match, well in this case, I would take all of his of turn data, and I would attempt to train the algorithm with supervised learning to predict the action he would take, and multiply the loss function by 1 (because he won), which is very similar to what you are doing now, except instead of only predicting a particular player’s turn data in the whole dataset, you are always predicting the actions taken by the winner of the match. Finally, I would also take the turn data for Player B (the loser), and perform the exact same training except I would scale the loss function gradient by -1, which is essentially training the network to AVOID mimicking his moves.

General idea is that you want your network to mimic the best player’s win they are winning matches, and avoid behaving like the players that lost. However, Terminals might have an advantage because of the HP system and how it definitively leads to a win (I think), so in that case, it might be worth training a network to try and predict how much HP damage it will take AND how much it will deal by playing a certain move, then you just pick the action with the highest expected dealt/received ratio. Or, just use the paradigm above, but instead of scalling loss by match outcome (-1 or 1), just scale it by (damagedealt - damagereceived) so that it learns to mimick the moves that have a high dealt/received ratio and avoid mimicking moves with a bad ratio. When I switched to this paradigm when competing in that other game, I saw a HUGE boost in performance because of the increase in dataset size that I got by looking at many more high-level players worth of data instead of just one.

Typically, you determine the epoch count (how many epochs you should train before stopping) via validation. So, for example, you start off with train/valid/test splits. You perform your training process, and suppose after N epochs you see that your validation error starts increasing (and doesn’t stop). Now, given this N, you combine your train+valid sets into one bigger set and re-train your model again using the best hyperparameters found during validation process (including the N) and you stop after training N epochs, and test the model on the test set to get your test error.

Finally, after having validated and tested your model, you take the ENTIRE dataset (train+valid+test) and you re-perform the entire training process on the whole dataset with the same hyperparameters (and N) as before. After you are done training, this is your final model that you have produced, and is likely to perform as good (or better) than its test error indicates (since it has more data to train on than it did previous). Very common error to only train on the train set and just use that model as your final model, missing out on a large chunk of your dataset.

EDIT: One last thing I thought of (sorry for the wall of text LOL), another improvement that significantly increased my bot’s winrate in that other game was to introduce data augmentation via rotation/mirroring to exploit the symmetry of the game. I’m not sure if you can rotate for Terminals, but it looks like Terminals is symmetrical horizontally, so that you can take all of the turn data you have, and then duplicate it and mirror the game state/actions, which will still be completely valid moves. In essence, it should quickly double the size of your dataset (with a bit of correlation) which is one of the best ways to reduce overfitting.

arnby · December 8, 2019, 7:38am

Thanks for every thing you wrote! Definitely insightful

However few things I noticed that are odd and prevent from doing such analysis:

The train and validation loss are following each other in almost every random variations, which is likely due to a high correlation between train and validation. This is bad but it is certainly because it learns from several versions and one version can have its match in the train , while the other in the validation or test set.
The model never really overfit but clearly perform worse if I train it to much (ML_Ealge_852, perform worse than ML_Eagle_747 and ML_Eagle_765)

it is also on my agenda to work on a model that learn from as much algos as I can, but that will come later

For data augmentation, one can flip alongside the X or/and Y axis without changing the “meaning” of the game indeed. (but only X in my case)

pyqt · March 5, 2021, 9:42am

so i am planning on making a ml alog next holidays with a friend and started thinking about how to do it.
i ran into some issues.first i dont know anything about how to code anything into that direction and only know some theroy about ml.second doesnt the algo develop something completely different from what is good online if you dont train it on the algos online.if anyone has help pls respond.also if anyone knows good resourcse for ml with python i would be very interseted in that to

acshikh · March 6, 2021, 9:32pm

This is an extremely nontrivial problem for Terminal. I would encourage you to learn about programming, artificial intelligence, and machine learning on an easier problem than Terminal. Basic Terminal strategies are a great way to learn Python, but more advanced AI and ML algorithms would probably be easier to learn on a simpler game.

In addition, it probably doesn’t make sense for you to respond to year-old posts in order to ask for help. If you want help, it will be more helpful to start a new thread. And for general AI and ML, this forum is probably not the right place to get help anyway. This forum is better for talking about the specifics as applied to Terminal, which are going to assume more background knowledge.

Ryan_Draves · April 26, 2021, 3:14am

I would respectfully disagree. Those comments above were the result of my first venture into ML, and I would still consider them to have been worthwhile and for Terminal to have been a good choice for it. It’s too large a problem to expect newcomers to get good results (I certainly never did), but I think the enormous state space makes it easy to be creative and explore, which is certainly worthwhile. Even with the large state space, Terminal is tractable enough to have some kind of results after following a few tutorials, and that’s all you need to whip up a simple model.

pyqt · October 2, 2021, 11:14am

So me and a friend started working on ML for terminal again a week, and bascialy got a structure for a q learning approach (using tf.keras and a custom training loop) that runs localy competing against some of our algos, and some we made to represent some common strategies, but we are to stupid to get a trained tensorflow model running on the servers. I have looked at the approach metioned above, but im to stupid for that as well. If there was anyone with experience in this, it would be realy nice if you could explain why its impossible to run tensorflow models with the tensorflow version on the servers .