The Ultimate Copy-Cat - ML Implementations

Felix · December 1, 2019, 7:00am

@arnby, yes I ment your RL-algos.

With this tool (thread) I can find everything. Even if it is very far away from the top 10
(hint: it’s currently not possible to find your RL-algos, because I reduced the number of algos that are getting loaded to increase the loading speed.)

IWannaWin · December 1, 2019, 7:04am

Yeah, it’s very dynamic, I was getting bored of just making static structure algos.

kkroep · December 1, 2019, 1:08pm

I see your point, but I laughed out loud when I realized all my algos have been simple finite state machines with a bunch of if statements…

I find debugging and refining a strategy to be very easy when you know exactly what piece of code builds what structure, so I usually try to get as far as I can with simple code, and add in more complex logic later on when I hit a wall

MrTwiggy · December 7, 2019, 2:32am

That’s fair, it might be an interesting experiment to try and compare the winrate of a bot trained with 100% of match data to the winrate of a bot trained with 50% matches worth of data. If it’s significantly better, could be a good indication there is still a lot to squeeze out by just collecting more data.

Additionally, just a thought, but something I’ve used in the past on another game (generals.io) that improved my initial behavioral cloning approach (copy-cat approach) was to move beyond mimicking a single particular algorithm (which was my first thought too). Instead, what I did was I took a huge number of match data from the top X players, and modified the loss function by scaling it with the match outcome (-1 for loss, 1 for win).

So for example, let’s say I have match data for player A vs player B, which consisted of 250 turns worth of data (one turn for each player, with a frame of history and the actions they took). Suppose Player A wins the match, well in this case, I would take all of his of turn data, and I would attempt to train the algorithm with supervised learning to predict the action he would take, and multiply the loss function by 1 (because he won), which is very similar to what you are doing now, except instead of only predicting a particular player’s turn data in the whole dataset, you are always predicting the actions taken by the winner of the match. Finally, I would also take the turn data for Player B (the loser), and perform the exact same training except I would scale the loss function gradient by -1, which is essentially training the network to AVOID mimicking his moves.

General idea is that you want your network to mimic the best player’s win they are winning matches, and avoid behaving like the players that lost. However, Terminals might have an advantage because of the HP system and how it definitively leads to a win (I think), so in that case, it might be worth training a network to try and predict how much HP damage it will take AND how much it will deal by playing a certain move, then you just pick the action with the highest expected dealt/received ratio. Or, just use the paradigm above, but instead of scalling loss by match outcome (-1 or 1), just scale it by (damagedealt - damagereceived) so that it learns to mimick the moves that have a high dealt/received ratio and avoid mimicking moves with a bad ratio. When I switched to this paradigm when competing in that other game, I saw a HUGE boost in performance because of the increase in dataset size that I got by looking at many more high-level players worth of data instead of just one.

Typically, you determine the epoch count (how many epochs you should train before stopping) via validation. So, for example, you start off with train/valid/test splits. You perform your training process, and suppose after N epochs you see that your validation error starts increasing (and doesn’t stop). Now, given this N, you combine your train+valid sets into one bigger set and re-train your model again using the best hyperparameters found during validation process (including the N) and you stop after training N epochs, and test the model on the test set to get your test error.

Finally, after having validated and tested your model, you take the ENTIRE dataset (train+valid+test) and you re-perform the entire training process on the whole dataset with the same hyperparameters (and N) as before. After you are done training, this is your final model that you have produced, and is likely to perform as good (or better) than its test error indicates (since it has more data to train on than it did previous). Very common error to only train on the train set and just use that model as your final model, missing out on a large chunk of your dataset.

EDIT: One last thing I thought of (sorry for the wall of text LOL), another improvement that significantly increased my bot’s winrate in that other game was to introduce data augmentation via rotation/mirroring to exploit the symmetry of the game. I’m not sure if you can rotate for Terminals, but it looks like Terminals is symmetrical horizontally, so that you can take all of the turn data you have, and then duplicate it and mirror the game state/actions, which will still be completely valid moves. In essence, it should quickly double the size of your dataset (with a bit of correlation) which is one of the best ways to reduce overfitting.

arnby · December 8, 2019, 7:38am

Thanks for every thing you wrote! Definitely insightful

However few things I noticed that are odd and prevent from doing such analysis:

The train and validation loss are following each other in almost every random variations, which is likely due to a high correlation between train and validation. This is bad but it is certainly because it learns from several versions and one version can have its match in the train , while the other in the validation or test set.
The model never really overfit but clearly perform worse if I train it to much (ML_Ealge_852, perform worse than ML_Eagle_747 and ML_Eagle_765)

it is also on my agenda to work on a model that learn from as much algos as I can, but that will come later

For data augmentation, one can flip alongside the X or/and Y axis without changing the “meaning” of the game indeed. (but only X in my case)

pyqt · March 5, 2021, 9:42am

so i am planning on making a ml alog next holidays with a friend and started thinking about how to do it.
i ran into some issues.first i dont know anything about how to code anything into that direction and only know some theroy about ml.second doesnt the algo develop something completely different from what is good online if you dont train it on the algos online.if anyone has help pls respond.also if anyone knows good resourcse for ml with python i would be very interseted in that to

acshikh · March 6, 2021, 9:32pm

This is an extremely nontrivial problem for Terminal. I would encourage you to learn about programming, artificial intelligence, and machine learning on an easier problem than Terminal. Basic Terminal strategies are a great way to learn Python, but more advanced AI and ML algorithms would probably be easier to learn on a simpler game.

In addition, it probably doesn’t make sense for you to respond to year-old posts in order to ask for help. If you want help, it will be more helpful to start a new thread. And for general AI and ML, this forum is probably not the right place to get help anyway. This forum is better for talking about the specifics as applied to Terminal, which are going to assume more background knowledge.

Ryan_Draves · April 26, 2021, 3:14am

I would respectfully disagree. Those comments above were the result of my first venture into ML, and I would still consider them to have been worthwhile and for Terminal to have been a good choice for it. It’s too large a problem to expect newcomers to get good results (I certainly never did), but I think the enormous state space makes it easy to be creative and explore, which is certainly worthwhile. Even with the large state space, Terminal is tractable enough to have some kind of results after following a few tutorials, and that’s all you need to whip up a simple model.

pyqt · October 2, 2021, 11:14am

So me and a friend started working on ML for terminal again a week, and bascialy got a structure for a q learning approach (using tf.keras and a custom training loop) that runs localy competing against some of our algos, and some we made to represent some common strategies, but we are to stupid to get a trained tensorflow model running on the servers. I have looked at the approach metioned above, but im to stupid for that as well. If there was anyone with experience in this, it would be realy nice if you could explain why its impossible to run tensorflow models with the tensorflow version on the servers .