Woah thank you so much for this insightful post! We are analyzing the merits of ML for the game so this post is extremely helpful.
The copy strategy I think will ultimately never work if you use multiple algos because its effectively just an “average” of what algos tend to do given certain situtations. Seeing how it plays is really interesting though, the replay you link shows that the maze-like structure is very popular, as even the average summation of all algos you trained on manage to leave a line open for your units to go through.
What is your ML background btw? Have you used it professionally or just played with it as a hobby? Have you tried stuff like OpenAI Gym?
Anyway I’m not a super expert but here are some approaches I would personally try:
Completely ML Approach: Pick an algo to copy as a base, and then use something like Q learning (or NEAT) or even genetic training to improve upon it. The hard part would be to optimize the random “off policy” moves it can do, essentially the way any of these ML approaches work is you try a random move it wouldn’t normally do and if it does well based on your reward function you increase the likelihood of doing it in a similar situation. Because getting a large number of matches is hard doing just anything random to “explore the action space” is probably not effective so I would try to make it so the “off policy” thing the algo tries is at least a little smart. For example, make sure it doesn’t try illegal moves, or doesn’t try obviously bad moves, but this is tricky. Of course this off policy thing could practically be an algo itself so finding a not too complicated way to try “possibly good” new moves would be nice. Deciding between NEAT, Q Learning, Genetic algos etc, is also tricky. When i was more into ML Q learning was the cutting edge, but I always thought NEAT and genetic algos were underrated as well, not sure what the consensus is now adays I haven’t been keeping up.
Multi-Algo Decider: The opposite end of the spectrum is instead of having the ML part of the algo decide specific firewall placements you could have it decide between a selection of hardcoded algos after a few turns. For example, start with a basic defense, then use ML to decide between which of the X preset algo designs should be chosen based on what configuration the enemy has. This would by far be the easiest to use ML, the output would just be between X possibilities, though it would also require a bunch of hardcodedish algos to use. Example preset algo strats could be: left side ping canon, right side ping canon, EMP line, center of board ping rush etc. Figuring out which one of the subset algos to use is pretty tricky for a regular algo to decide, an ML approach could definitely help in deciding. You could also have it wait to “decide” until the enemy algo commits most of its cores as well. Of course this wouldn’t work well against super dynamic algos that completely change their strategy mid game like Aelgoo but could still do really well probably reach top 10. This algo wouldn’t need to be pretrained on copying a previous algo.
- Heuristics approach: A subset of this approach is instead of having the ML part decide which algo to use, it just decides or predicts features of the game. For example, it could decide is the enemy using a maze or not (and you could train it on algos you know are mazes and some you know that aren’t). Is the enemy attacking the left or right side. Is the enemy a ping canon? Is it better to attack right or left side? Basically you use the ML part just to predict features about the game and then a regular algo uses those features to make decisions.
Hybrid Approach : Inbetween the last two approaches is a hybrid approach. Instead of choosing the exact firewall location, or the opposite of just picking between preselected strategies, the ML part of the algo uses a combination of predetermined structures to build itself to avoid clobbering itself. For example, there are a few “standard” defense shapes, like a destructor with some filters around it to soak up damage, or an EMP line etc. These can be also divided into a few preset locations that are optimal to place them in. For example you may have a optimized defense that is good against EMPs, we can call that substructure A, and another that defends well from PINGs substructure B. Instead of having the ML strategy decide “Put a destructor here” it could decide “put substructure A in position 3” substructure A being forified destructor for example position 3 being the middle left in the front. I imagine it would feel something like how Aelgoo feels like to play against. The downside is it wouldn’t come up with new substructures, but you could train a separate network that just determines substructures or something, one that is even a subcomponent of the main big net even (there are a lot of crazy neural net graphs out there in the research). To be even more ML based you could have it choose more specifically what to do and just have some baseline rules to help guide it like never make a move that completely blocks attack paths or something. For attacking it gets a bit tricky because attack and defense are so linked, but there are all sorts of things you could do it make it work well like only do attack structure A if defense structure B is not used or something (though hopefully the ML also learns to do this on its own).
- Pretraining: This algo also wouldn’t need to be pretrained on a copy but there are a lot of weird pretraining techniques that could be useful, for example the one where you remove the last few layers and replace them with a few layers that output something different like the substructure choices above and retrain thereby keeping some of the basic pattern recognition that the bot you are copying from has (but using those heuristics to output higher level substructure suggestions instead of specific locations). For example you could copy aelgoo, then remove the last few layers and have it output substructures instead, whatever aelgoo is using to detect where to defend will be kept in the first few layers logic, but the output would be retrained to use your substructures. Of course, the pretraining that I’m describing is kind of an old technique and your mileage may vary, sometimes it doesn’t really help at all but sometimes its very useful like with autoencoders.