Thanks for the comments!
@arnby regarding following a consistent strategy, I think an excellent solution (on paper) is your idea of an RNN. Certainly, this makes sense given how each state of the game board is dependent on the turns before it. This would help a lot with making it stick to one strategy and also recognizing that strategy as independent from others. I thought I would try and tackle NEAT next since I’m just interested in the algorithm itself and haven’t played with it, but with some preliminary work already done towards NEAT, I can say I think it’s possible but may be stripped to a point where it is not functional and is going to take quite a bit of work. I’ll talk more about NEAT below. But given how NEAT provides some quite difficult challenges and an RNN will be (fairly) simple for me to convert my current setup I’ll definitely give that a try.
@kkroep, I definitely realize in hindsight what you are saying about your algo being a poor candidate since it focuses on the attack. However, I simply wanted to see if a recreation would work and I randomly picked a top algo to mine data, but the idea of going after @Aeldrexan’s AGLOO is an excellent one and it would be very interesting to see the results, I’ll definitely give it a test run in a little while out of curiosity. (Assuming @Aeldrexan doesn’t have any complaints - speak now or forever hold your peace ).
@C1Junaid, wow, some awesome ideas presented in your reply. My goal is to make whatever ML implementation as pure as possible. What I mean by this is that I agree, a Multi-Algo Decider is probably a much easier implementation, but I hope to be able to train some sort of network to think without any static elements (I am not including things like pre-training and substructures in this). My general process (I believe) will be to start by giving as little guidance as possible, and slowly work backward in terms of the amount of static instruction regarding the strategy.
I noticed two common themes regarding your list, and I feel ML can be used to benefit an algo in two main ways. It can make either make the decisions, or provide information (like what strategy to use) that help an already built algo make decisions. I think the latter here is a “safer” and probably more reliable decision, which is naturally why I want to try the former first :).
That being said, I will be trying many different approaches and will draw from both sides of this equation. I have never fully done machine learning before so I am looking to learn lots of different implementations. I have followed ML for quite a while and did things like build my own (very basic) neural network that did forward and backward propagation from scratch in C++. I have looked at OpenAI gym before, but haven’t done anything yet. Every time I started working on something other things have taken precedence and time has eluded me, so Terminal provides an excellent opportunity for me to learn about this :).
Some thoughts about NEAT and genetic algorithms:
I began some preliminary work setting up a NEAT algo and it became very clear that the size of each genome using the raw board as inputs and outputs is simply too large (this is exaggerated by the inefficiency of python - which could be offset in C but I’m not ready to spend that much time working on that yet). There are also some issues regarding training in parallel since I obviously don’t want to create a new folder for every single genome. Instead, I have a single folder that references a pickled Genome object that it will use for its logic. Then it gets which pickled object it should use from the (slightly modified) run_arena.py script. This leads to race conditions when creating the algos, but this is pretty easily solved (thought I’d mention it anyway though).
The big issue is simply the number of inputs and outputs. If you have each board position and health, cores, and bits for each player as inputs, there are 427 (one for bias) inputs. Then, you can expect a similar number of outputs, or possibly more depending on your implementation. This is nothing, small in fact, for a standard NN model, but using an inefficient language (like I said, not doing this in C) where you need to create a new population every epoch with new mutations, this is extremely inefficient and slow.
The obvious solution (which is very common) is to simply reduce the number. This is fairly easy for the inputs since you can simply package the data differently (an analogy would be making an image more pixelated) without really changing the information the network receives. Where I am struggling to find a solution is reducing the number of outputs. This is, of course, assuming an implementation that makes a base. It is much more difficult to take a pixelated image and make it high resolution. Similarly, it is challenging to make the number of outputs give more data without actually creating more nodes. The current idea I am toying with is some sort of hashing function, but if anyone has any thoughts I’d love to hear them. The other issue with the number of inputs/outputs, of course, is just training time - it will take much much longer to learn with more initial nodes.
All of the above problems (which really all come down to training time) make me hesitant to try a NEAT implementation without at least trying the RNN idea first (and the little test with ALGOO).
Little update:
Regarding the training, I mentioned in my first comment (the 11,000 matches): it experiences the same issue observed by everyone thus far regarding committing to a strategy. It is now very confident in its decisions it makes (>.98), but it still chooses everything. This more than anything makes me want to try an RNN, but if anyone has other suggestions or reasons I shouldn’t just let me know.
Thanks!