The Ultimate Copy-Cat - ML Implementations

The Ultimate Copy-Cat - ML Implementations
0

#21

@arnby, yes I ment your RL-algos.

With this tool (thread) I can find everything. Even if it is very far away from the top 10 :grinning:
(hint: it’s currently not possible to find your RL-algos, because I reduced the number of algos that are getting loaded to increase the loading speed.)


#22

Yeah, it’s very dynamic, I was getting bored of just making static structure algos.


#23

I see your point, but I laughed out loud when I realized all my algos have been simple finite state machines with a bunch of if statements… :rofl:

I find debugging and refining a strategy to be very easy when you know exactly what piece of code builds what structure, so I usually try to get as far as I can with simple code, and add in more complex logic later on when I hit a wall


#24

That’s fair, it might be an interesting experiment to try and compare the winrate of a bot trained with 100% of match data to the winrate of a bot trained with 50% matches worth of data. If it’s significantly better, could be a good indication there is still a lot to squeeze out by just collecting more data.

Additionally, just a thought, but something I’ve used in the past on another game (generals.io) that improved my initial behavioral cloning approach (copy-cat approach) was to move beyond mimicking a single particular algorithm (which was my first thought too). Instead, what I did was I took a huge number of match data from the top X players, and modified the loss function by scaling it with the match outcome (-1 for loss, 1 for win).

So for example, let’s say I have match data for player A vs player B, which consisted of 250 turns worth of data (one turn for each player, with a frame of history and the actions they took). Suppose Player A wins the match, well in this case, I would take all of his of turn data, and I would attempt to train the algorithm with supervised learning to predict the action he would take, and multiply the loss function by 1 (because he won), which is very similar to what you are doing now, except instead of only predicting a particular player’s turn data in the whole dataset, you are always predicting the actions taken by the winner of the match. Finally, I would also take the turn data for Player B (the loser), and perform the exact same training except I would scale the loss function gradient by -1, which is essentially training the network to AVOID mimicking his moves.

General idea is that you want your network to mimic the best player’s win they are winning matches, and avoid behaving like the players that lost. However, Terminals might have an advantage because of the HP system and how it definitively leads to a win (I think), so in that case, it might be worth training a network to try and predict how much HP damage it will take AND how much it will deal by playing a certain move, then you just pick the action with the highest expected dealt/received ratio. Or, just use the paradigm above, but instead of scalling loss by match outcome (-1 or 1), just scale it by (damagedealt - damagereceived) so that it learns to mimick the moves that have a high dealt/received ratio and avoid mimicking moves with a bad ratio. When I switched to this paradigm when competing in that other game, I saw a HUGE boost in performance because of the increase in dataset size that I got by looking at many more high-level players worth of data instead of just one.

Typically, you determine the epoch count (how many epochs you should train before stopping) via validation. So, for example, you start off with train/valid/test splits. You perform your training process, and suppose after N epochs you see that your validation error starts increasing (and doesn’t stop). Now, given this N, you combine your train+valid sets into one bigger set and re-train your model again using the best hyperparameters found during validation process (including the N) and you stop after training N epochs, and test the model on the test set to get your test error.

Finally, after having validated and tested your model, you take the ENTIRE dataset (train+valid+test) and you re-perform the entire training process on the whole dataset with the same hyperparameters (and N) as before. After you are done training, this is your final model that you have produced, and is likely to perform as good (or better) than its test error indicates (since it has more data to train on than it did previous). Very common error to only train on the train set and just use that model as your final model, missing out on a large chunk of your dataset.

EDIT: One last thing I thought of (sorry for the wall of text LOL), another improvement that significantly increased my bot’s winrate in that other game was to introduce data augmentation via rotation/mirroring to exploit the symmetry of the game. I’m not sure if you can rotate for Terminals, but it looks like Terminals is symmetrical horizontally, so that you can take all of the turn data you have, and then duplicate it and mirror the game state/actions, which will still be completely valid moves. In essence, it should quickly double the size of your dataset (with a bit of correlation) which is one of the best ways to reduce overfitting.


#25

Thanks for every thing you wrote! Definitely insightful :grin:

However few things I noticed that are odd and prevent from doing such analysis:

  • The train and validation loss are following each other in almost every random variations, which is likely due to a high correlation between train and validation. This is bad but it is certainly because it learns from several versions and one version can have its match in the train , while the other in the validation or test set.
  • The model never really overfit but clearly perform worse if I train it to much (ML_Ealge_852, perform worse than ML_Eagle_747 and ML_Eagle_765)

it is also on my agenda to work on a model that learn from as much algos as I can, but that will come later

For data augmentation, one can flip alongside the X or/and Y axis without changing the “meaning” of the game indeed. (but only X in my case)