I found this interesting challenge months ago and did it with the intention to try something new. I don’t really have a personal blog so I’ll post my experiences here because they were very interesting. I want to mention beforehand I have not actually succeeded in making a top algo so if you’re looking for quick strats this will not help you.
I wanted to make this challenge a completely technical one for me, rather than a strategic/game skill one. This meant that I was planning to limit myself to machine learning as much as possible, committed to not writing a single map or tactic by hand, and avoided thinking too much about specific metas/strategies. However I now realise this is a mistake if the goal is to have maximal elo. I have also bitten off more than I could chew because I was not well practised in practical machine learning. I have learnt a lot so at least in that regard I have succeeded.
Here were a few of my failed approaches and their results:
- Genetic algorithm:
Back when I was using the very slow java engine to simulate my games, I had evolving static maps that would mutate, cell by cell, their firewall type and build order. Each map would also have designated fixed attack strategies which would evolve with it. The entire strategies would be pitted against each other and their win-rates would be counted as their fitness.
I was hoping that it will discover new interesting strategies like mazes and stuff, but after leaving them for a few weeks they all fell into a local minimum: ping cannons. Poor ping cannons at that. Right now it is at 1620 elo. I only had one population too, so the ping cannon would take over everything before other specimens could figure out how to counter them. Genetic algorithms might work on some parts of the game but my approach was very wrong. But, I was ready to try something else already.
- Neural Network to evaluate maps:
I downloaded a bunch of data (like… all of it). With that I sifted out the expert games (relatively recent and >1900 elo) and trained a convolution neural net to predict the probability of p1 or p2 winning. My input was about 15 layers consisting of game state, and a hard-coded feature - path coverage from the 4 sides. My idea was that somehow this CNN would help some kind of generative algorithm to compose adaptive strategies on the fly.
For the generator algorithm, I took the firewall configuration of a sample (~100) of winners in games where high rated algos were defeated, and put them in a file. This meant that many people’s strategies and their intermediate building steps were in my algo! There were about 20k states stored. Of course, only the look of an algo at a given time could be extracted while its “personality” was not easy to extract. When it’s my on_turn(), my generative strategy used a quick heuristic to sift out a few hundred candidate firewall configurations based on resources available and likeness to current firewall layout. Out of those hundreds, my slower CNN would pick out the highest EV configuration. My algo and would then build that.
(My attacking strategy was a very nice simulation + game-theory-optimal solver but I will not share the method for that for now.)
It somewhat worked! I was able to see that the strategy adapted to the opponent. One test-algo I had, called “smallball”, would create a thin funnel in the middle of the map, and if unchallenged, would force opponent information units down a stomach of destroyers. My CNN strategy actually discovered to plug that hole with a line of filters, allowing my attack to send pings to destroy the funnel on the weak sides! Also, in track algos, it sells stuff in forsaken “fire zones” and refrains from building there. However it managed an elo of o̶n̶l̶y̶ edit:1701 at its highest. It’s “Select_CNN_Metric”.
One of the problems with it is that it’s overfitted only to strategies it has seen, and even then, it has not seen enough data to become an expert in those strategies. I tried to make a version that sold firewalls that weren’t part of its current config, but it did very poorly because it kept selling everything as it switched from one strategy to another. The NN was not very accurate so the generator switched configurations very often.
A solution to this over-fitting is to generate self-played games and retrain the neural net over and over again with progressively improved and diverse data. This is kind of what the recent Alpha Go/Zero algorithms did - so I tried that:
- Alpha Zero:
I went way over my head for this one.
At this point I already have a very very fast C++ simulator (something like thousands of complete turns / second)
I took an existing implementation and ported the important classes in alpha-zero-general in C++ and added my simulator to that. I defined a “turn” as the building of one unit. There were some challenges to make this work in theory - AlphaZero, although general, is designed for perfect information games but the players in terminal can’t see the opponent’s spawning units while in the build phase. I decided to overcome this by just hiding the opponent’s building phase when passing board states to the neural net. Other than the build phase, Terminal IS a perfect information game.
In the end, it did not matter. I’ve yet to run the simulation for a long time, but I’ve heavily underestimated the resources needed to train AlphaZero. Even for relatively simple games, AlphaZero takes quite a lot to train to expert level. Terminal, in terms of discrete steps, was a VERY complicated game, and would need many thousands of computing hours that I don’t have to get something even primitive. Imagine that chess was trained on thousands of TPU’s for days and I had a dual-core laptop. Terminal is even more complex than chess.
It might be possible to use existing data to make a supervised network to kick start AlphaZero, however iterating that would still take eons. The game could also be abstracted or approximated somehow to a simpler one that AlphaZero might manage on a PC, but I’ve yet looked into that.
I have looked at this challenge as a learning opportunity rather than a competitive one (that’s my excuse for not making a top algo XD)
Perhaps I was foolish to refrain from learning the strategic metas and not trying to build some good algos by hand. I thought that doing so would add tunnel-vision and bias to my methods, but instead it would have given me an insight into what capabilities my model needed to succeed, and which part of the algo would ML be appropriate. In the future if I’m doing a ML problem I’ll be sure to not refrain from learning the problem with my own neural net first (brain).
I realise now that terminal is complicated enough that it would be a mistake to try to make a single ML model that integrates high level strategy with the construction of individual units, at least with PC resources. It’s what I essentially tried to do.
This game ended up a lot deeper than I originally thought and now I’m curious what the theoretical “optimal” strategy would look like. (Because the build-phase is partial information, it will also be non-deterministic! Like rock-paper-scissors) For sure the top algos are good right now but this is just the tip of the iceberg.